Detection of Fusion Events Using Replicate PCR Reactions

ABSTRACT

The present disclosure relates to methods for detecting and targeting genomic rearrangements, in particular gene fusion events, by targeting a DNA molecule of interest with a set or pool of primers, wherein the forward primers and reverse primers produce a PCR amplification product when a genomic rearrangement is present. The present disclosure also relates to methods of bioinformatic analysis to determine whether or not the detection of an amplification product from the selective PCR is actually indicative of the presence of a gene fusion. The present disclosure also related to related methods of diagnosis and treatment of diseases and conditions associated with such genomic rearrangements, in particular cancers, such as lung cancer.

CROSS-REFERENCING

This application claims the benefit of GB1709675.1, filed on Jun. 16,2018, which application is incorporated herein in its entirety for allpurposes.

BACKGROUND

The present disclosure relates to methods for detecting genomicrearrangements, in particular gene fusion events, as well as relatedmethods of diagnosis and treatment of diseases and conditions associatedwith such genomic rearrangements, in particular cancers, such as lungcancer.

Genetic or chromosomal rearrangements are a type of chromosomalabnormality in which the normal order of the genetic code has beenaltered. A common genomic rearrangement that is associated with canceris genetic fusion. A gene fusion event may occur in cancerous orpre-cancerous cells and can be detected in patients to help classify thecancer and determine appropriate treatments.

Existing methods for detecting gene fusions include fluorescence in situhybridization (FISH), RT-PCR, long range PCR and hybridisation capturefollowed by next generation sequencing. FISH uses DNA or RNA probes,tagged with targets for antibodies, with fluorophores or with biotin.These probes are applied to an interphase or metaphase chromosomepreparation in order to detect either the co-localisation of two genestypically separate in a nuclei or the breaking apart of a genes signal,indicating its fusion to another region of the genome. An example ofthis approach is the detection of the BCR/ABL fusion by FISH and the useof this to monitor response to therapy in chronic myeloid leukemia(Dewald G W et al. Blood. American Society of Hematology; 1998; 91:3357-65). FISH can only be applied to cells or tissue and thereforecannot be used to detect gene fusions in cell free circulating nucleicacids or in DNA already extracted from tissue. FISH also requires intactnuclei, the need to visually assess individual cells and cannot give thesequence of the breakpoint.

RT-PCR can be used to amplify the messenger RNA (mRNA) transcript of afusion gene and specifically detect its presence. Reverse transcriptionis used to convert RNA to cDNA followed by PCR using fusion specificprimers to amplify the fusion of interest. As intronic sequences arespliced out in the generation of mRNA this is relatively simple andtypically requires just one pair of primers or a small multiplex ofprimers. The products of such a PCR can then be detected in multipleways such as through gel electrophoresis with intercalating agents likeethidium bromide or using a fluorescent probe with real time or digitalPCR. For example (U.S. Pat. No. 4,874,853). However, this approach canonly be applied to mRNA and it is not feasible to identify thebreakpoints that have occurred in the genome (DNA). mRNA typically has ashort half-life and is therefore a challenging biomarker in heavilydegraded samples such as FFPE and circulating RNA (ctRNA).

An alternative is to directly detect the fusion in genomic DNA. Asignificant challenge with this approach is that the fusions will oftenoccur throughout large intronic spaces. One solution to this is longrange PCR. By using a limited number of primers tiled typically 500 bpto 10,000 bp apart throughout each gene of interest it is possible tosetup multiple singleplex PCR reactions in order to amplify the fusiongenes (Lawson A R J et al. Genome Res. 2011; 21:505-14; EP1914240;Duployez N et al. Am J Hematol. 2014; 89: 610-615). To reduce the numberof reactions required it has been shown that this long-range PCR can beperformed in multiplex. Metzler et al developed a multiplex of 25forward primers and 5 biotinylated reverse primers used to amplify thetranslocation t (4; 11) followed by positive selection for PCR productscontaining one of the biotin-labelled primers then a second PCR usingadditional target specific primers in a method called Asymmetricmultiplex PCR (Metzler M et al. Br J Haematol. 2004; 124: 47-54). Inthis method their primers were typically 1,000 bp apart or greater. Thelimitation with these approaches is the requirement for DNA greater than500 bp in length and therefore they are not suitable for fragmented DNAsuch as FFPE or cfDNA. These long-range PCR methodologies are also notcompatible with Next Generation Sequencing Technologies due to theshort-read length (<500 bp) of most next generation sequencing platformswithout further complex steps such as fragmenting the DNA then ligatingon adaptors. This methodology also requires a potentially large numberof individual PCR reactions to assess each sample or complex steps suchas positive selection using biotin-labelled primers in order tomultiplex such a reaction.

Alternatively, targeted enrichment and detection of fusions can beachieved by hybridisation, whereby a biotinylated probe (˜120 bases)with complementarity to the target of interest, is hybridised to the DNAunder investigation to selectively recover and thus enrich for regionsof interest with the use of streptavidin coated magnetic beads. In thiscase, regions of interest are genomic regions that are known to undergogene fusions. With this hybridisation approach, genomic regions ofinterest are recovered whether or not they have undergone a gene fusionevent. Thus, significant sequencing capacity is expended decoding suchregions even when no fusion event has occurred. Additionally, with thisapproach, prior to enrichment, DNA has to be extensively processed(consisting of end-repair, A-tailing and ligation). As these steps arehighly inefficient ˜70% of starting material is lost prior tohybridisation, limiting the ability to detect fusions present at lowallelic frequencies. Finally, this approach is time consuming, normallyrequiring overnight incubation of target and probe to enablehybridisation.

Target enrichment by primer extension enables fusion gene detection withknowledge of only one of the two fusion partners. However, this approachis time consuming and requires the ligation of universal adapters to DNAends. The inefficiency of ligation limits the ability to detect fusionspresent at low allelic frequencies. As with a hybridisation approach,genomic regions of interest are recovered whether or not they haveundergone a gene fusion event with this approach and thus, significantsequencing capacity is needed to assess these large regions if highsensitivity is required.

US20160319365 discloses methods for detecting chromosomal rearrangementsusing hybridisation probes. However, such techniques require probes tobe designed for the target region, in addition to PCR primers for thetarget region. Therefore, the likelihood of detecting a fusion event isdiminished as an additional enrichment step is performed. It alsoincreases complexity and cost of the workflow as an additionalhybridisation probe is required.

There remains in the art a need for a method of detecting gene fusionsin an efficient manner, such that gene fusion events can be detectedeven when they occur at a low allelic frequency, particularly when suchfusions occur in highly fragmented genomic DNA.

These and other features of the present teachings are set forth herein.

SUMMARY OF THE INVENTION

The present disclosure relates to methods for targeting genomicrearrangement, in particular gene fusion events, by targeting a DNAmolecule of interest with a set or pool of primers, wherein the forwardprimers and reverse primers produce a PCR amplification product when agenomic rearrangement is present. This is achieved by targeting a firstregion with the forward primers and targeting a second, different,region with the reverse primers. The forward and reverse primers producean amplification product when they anneal in sufficient proximity toeach other. Hence, an amplification product will be produced when agenomic rearrangement has occurred to bring the first and second regionsinto sufficient proximity. The amplification product is then sequencedto identify the presence and position of the genomic rearrangement. Bycombining selective amplification and sequence determination it ispossible to identify a genomic rearrangement at low allelic fractioneven if the PCR produces off-target amplification. The methods disclosedherein do not require a further enrichment step, such as enrichmentcomprising hybridisation to a probe. The sequence of a reaction productis indicative of the presence of a genomic rearrangement, since thesequence read can be used to directly detect (and characterise) thefusion. Multiple genomic rearrangements can be detected in a singlereaction by using multiple sets or pools of primers to detect thegenomic rearrangements, wherein each paired set or pool of primers isdesigned to amplify a different genomic rearrangement (if present). Themethods can also be combined with methods to determine the presence orabsence of genetic alterations that are not genomic rearrangements, suchas single nucleotide polymorphisms (SNPs). This can be achieved by usingadditional primer pairs that act both as a positive control and tofurther characterise a disease or disorder or a patient from whom asample has been taken and analysed. The present disclosure does notrequire end repair or ligation to enrich for targets of interest andtherefore a further advantage is that there is no loss of startingmaterial due to processing prior to fusion detection.

The methods disclosed herein generally comprise:

-   -   a. contacting a sample comprising a DNA molecule of interest        (DMOI) with one or more forward primers and one or more reverse        primers, wherein the or each of the forward primers is specific        for a first region of interest, and the or each of the reverse        primers is specific for a second, different, region of interest;        and    -   b. conducting PCR.

In a first aspect, there is provided a method of detecting a genomicfusion event, comprising:

-   -   a. contacting a sample comprising DNA molecules of interest        (DMOIs) with a pool of at least 20 region-specific forward        primers and a pool of at least 20 region-specific reverse        primers, wherein:        -   i. each of the forward primers in the forward primer pool            comprises a sequence specific for a first region of interest            and a first primer binding site; and        -   ii. each of the reverse primers in the reverse primer pool            comprises a sequence specific for a second, different,            region of interest and a second primer binding site;    -   b. amplifying the DMOIs using the region-specific primers;    -   c. conducting PCR using forward primers that target the first        primer binding site and reverse primers that target the second        primer binding site;    -   d. sequencing the PCR amplification product to provide a library        of sequence reads, wherein the sequence reads comprise the        sequence of a forward and/or reverse primer used in step (a);    -   e. using the sequence reads provided in step (d) to determine        the sequence of the genomic fusion between the first and second        regions of interest.

In some embodiments, the first primer binding site is the same in eachof the at least 20 region-specific forward primers and the second primerbinding site is the same in each of the at least 20 region-specificreverse primers. The first and second primer binding sites may bedifferent from each other. The first and second primer binding sites actas universal primer bindings sites in a subsequent PCR.

Step (b) may comprise multiplex PCR, which is also a selective PCR,since an exponential amplification product will be produced in thepresence of a genomic rearrangement event that brings the first andsecond regions into sufficient proximity to each other. Generally, thegenomic rearrangement is a gene fusion.

The methods comprise sequencing the final amplification product. Hencein some embodiments the methods comprise decoding the genomicrearrangement (e.g. gene fusion) by sequencing. In some embodiments, themethod comprises multiple PCR reactions, for example a first PCR usingthe region-specific primers and a second PCR using primers specific forsequences introduced into the amplicons by the primers used in a firstPCR.

In a second aspect, there is provided a method, comprising:

-   -   a. providing a sample from a patient, said sample comprising one        or more DMOIs; and    -   b. determining the presence or absence of a genomic        rearrangement event according to a method disclosed herein.

The method may be a method of diagnosing or characterising cancer, amethod of determining cancer prognosis, a method of determining cancerremission or relapse, a method of characterising cancer, a method ofdetecting progression of cancer, or a method of determining the presenceor absence of residual cancer. The method may comprise extracting,isolating or enriching for the DMOI from the patient sample prior todetermining the presence or absence of a genomic rearrangement. However,an advantage of the methods and kits disclosed herein is that enrichmentof the sample for the DMOI is not required, and so the methods do notinvolve the loss of sensitivity due to inefficient enrichment methods.

In a third aspect there is provided a method of treating a disease, suchas cancer, comprising

-   -   a. providing a sample from a patient, said sample comprising one        or more cell-free DNA molecules of interest (DMOIs);    -   b. determining the presence or absence of a genomic        rearrangement event according to a method disclosed herein; and    -   c. administering a therapy to the patient, such as a cancer        therapy.

In a fourth aspect there is provided a method of determining a treatmentregimen for a patient, such as a cancer patient or a patient suspectedof having cancer, comprising:

-   -   a. providing a sample from a patient, said sample comprising one        or more cell-free DNA molecules of interest (DMOIs);    -   b. determining the presence or absence of a genomic        rearrangement event according to a method disclosed herein; and    -   c. selecting a treatment regimen for the patient according to        the presence or absence of a genomic rearrangement in the one or        more DMOIs.

The method may further comprise administering said treatment regimen tothe patient.

In a fifth aspect there is provided a method of predicting a patient'sresponsiveness to a cancer treatment, comprising

-   -   a. providing a sample from a patient, said sample comprising one        or more cell-free DNA molecules of interest (DMOIs);    -   b. determining the presence or absence of a genomic        rearrangement event according to a method disclosed herein;    -   c. predicting a patient's responsiveness to a cancer treatment        according to the presence or absence of a genomic rearrangement        in the one or more DMOIs.

In another embodiment, there is provided a method of early cancerdetection/diagnosis of cancer, comprising:

-   -   a. providing a sample from a patient, said sample comprising one        or more cell-free DNA molecules of interest (DMOIs);    -   b. determining the presence or absence of a genomic        rearrangement event according to a method disclosed herein;    -   c. diagnosing a patient as having cancer if a genomic        rearrangement event is detected.

The methods disclosed herein are combined with sequencing of theamplification product of the forward and reverse primers to detect thegenomic rearrangement.

The methods disclosed herein therefore comprise sequencing theamplification product and determining the sequence of the DNA that hasbeen amplified (decoding the DMOI by sequencing). This enablesnon-specific (off-target) amplification to be discounted and truegenomic rearrangements (such as gene fusions) to be identified andcharacterised. The methods allow the identification of a gene breakpointin a gene fusion and enable a disease, in particular cancer, to becharacterised. The methods can also be used to assess and/or monitorcancer progression in a subject, optionally a subject that has receivedor is receiving treatment for the cancer. The methods can also predictwhether or not a patient will respond to a given cancer treatment.

In a further aspect there is provided a kit of parts comprising aplurality of forward primers and a plurality of reverse primers, whereinthe forward primers are each specific for a first region of interest,and the reverse primers are each specific for a second, different,region of interest.

In a still further aspect there is provided a pool of forward andreverse primers, comprising a plurality of forward primers specific fora first region of interest and a plurality of reverse primers specificfor a second region of interest, wherein the first and second regions ofinterest are different to each other. The kits and primer poolsdisclosed herein can be used in the methods disclosed herein todetermine the presence or absence of a genomic rearrangement.

In a further embodiment there is provided a reaction mixture comprising:

-   -   a. a kit or pool of primers disclosed herein; and    -   b. a sample from a patient containing a DMOI derived from a        neoplasm or a cancer.

In a further embodiment there is provided the kits or primer poolsdisclosed herein for use in the diagnosis of a disease such as cancer.

In a still further embodiment, there is provided a method fordetermining the presence or absence of a gene fusion in a DMOI, themethod comprising:

-   -   a. providing the sequence of a DMOI as a sequence read;    -   b. identifying in the sequence read the presence of at least one        forward primer binding site and the presence of at least one        reverse primer binding site from a population of forward and        reverse primers;    -   c. determine the corresponding genomic locations of the forward        and reverse primer binding sites by reference to the sequences        of the forward and reverse primer binding sites and the        sequences downstream and adjacent to the forward and reverse        primer binding sites in the sequence read:

determining the presence or absence of a gene fusion in the DMOI.

BRIEF DESCRIPTION OF THE FIGURES

The skilled artisan will understand that the drawings, described below,are for illustration purposes only. The drawings are not intended tolimit the scope of the present teachings in any way.

FIG. 1: Cell line fusion mix (custom product Horizon Discovery Group)consisting of a mixture of EML4-ALK fusion-positive DNA and normal(fusion-negative) DNA was serially diluted to achieve allelic fractionsof 1%, 0.5%, 0.25%, 0.125% and 0.0625%. Fusion-negative human placentalDNA (bioline) was added to maintain the genome input copy numberconstant at 4000 input copies. Fusion enrichment is achieved byselective PCR and a second PCR ensures the addition of barcoded illuminaadapters. Fusion genes are decoded by next-generation sequencing e.g. onthe NextSeq 500 Illumina platform. Sequencing data is screened for thepresence of fusion genes using the described bioinformatic pipeline anddata is published in a fusion detection report.

FIG. 2: The sequence of the EML4-ALK gene fusion in the fusion-positivematerial from Horizon is known (expected EML4-ALK breakpoint). A.Different combinations of adjacent primers are able to amplify the genefusion. Two types of reads, containing the fusion breakpoint, areobtained as a result of amplification of the fusion by different primerpairs. Both reads contain the expected fusion gene sequence as well asflanking DNA sequences of different lengths. From top to bottom: SEQ IDNOS: 1, 2, and 3. B. Fusion detection was performed on three replicatesat each of the allelic frequencies. Fusions were detected at all allelicfractions in all replicates.

FIG. 3: Median read depth obtained for an ALK-EM L4 fusion at 0.0625,0.125, 0.25, 0.5 and 1% allelic fraction. The median of the number offusion reads detected in the three replicates was calculated and plottedagainst the different allelic fractions.

FIG. 4: Experiment determining ROS1 Fusion. To test the detection offusion genes between ROS1 and CD74, a 500 bp fragment of synthetic DNA(gblock) that contains the sequence of a published ROS1-CD74 gene fusionwas synthesized by IDT. The synthetic gblock was fragmented bysonication (Covaris) to an average of 150 bp and added to shearedfusion-negative human placental DNA to achieve an allelic fraction of 1%at 4000 input copies. The fusion gene was amplified by selective PCR anddecoded on the NextSeq500 (Illumina). Sequencing data is screened forthe presence of fusion genes using the described bioinformatic pipelineand data is published in a fusion detection report.

FIG. 5: The sequence of the synthesised ROS1-CD74 fusion containinggblock is depicted. The gblock was fragmented to an average of 150 bpprior to inputting into the assay. SEQ ID NO: 5.

FIG. 6: Next Generation sequencing reads obtained from ROS1-CD74 gBlock.A. Two combinations of forward and reverse primers amplified the fusiongene (SEQ ID NOS: 5 and 6). B. The sequence of the read detected foreach is shown. The sequence of the read is shown in bold letters withinthe geneblock sequence (SEQ ID NOS: 7 and 8).

FIG. 7: Example 2-step workflow showing multiplex PCR conducted withprimers that tile genes of interest at intervals (75 bp in thisexample). Gene A is tiled only with forward primers and Gene B is tiledonly with reverse primers. The primers contain a universal primer site(UPS) (for example part of an Illumina adaptor sequence) at the 5′ endand a gene specific sequence at the 3′ end. A. in normal cells that donot have a gene fusion, PCR amplification does not occur as the distancebetween the genes is too great. B. in fusion-positive cancer cells,Genes A and B are brought into close proximity with one another (forexample within 150 bp) so a product is generated by PCR amplification C.The presence of the UPSs (such as UPSs incorporated in partialsequencing adaptors, such as partial Illumina adaptors) allows theconstruction of complete sequencing adaptors (such as Illumina adaptors)in a second round of PCR. This second round of PCR uses primers thatanneal to the UPS element of the original primers (at the 3′ end of theprimer) and contain the rest of the sequencing adaptor (at the 5′ end ofthe primer).

FIG. 8: Bioinformatic method for calling gene fusions: Amplicons aregenerated by two primer pairs amplifying a fusion event which are thensequenced (dotted line indicates read) by NGS (Black Arrows indicatesequencing primers). The analysis method involves determining theminimum number of base pairs that need to be sequenced (for each primersite) to uniquely match a target region. A strong anchor has sufficientbase pairs sequenced to uniquely match a target region, a weak anchordoes match a target region but also matches other regions in thereference genome, it therefore does not uniquely match the targetregion. The method uses the known primer binding locations to determinethe expected sequence within the reads which removes the need foraligning reads to the entire reference genome. A. An amplicon has twostrong anchors with both the ALK and EML4 portions of read uniquelymatching an ALK or EML4 reference sequence. B. An amplicon has onestrong anchor and one weak anchor. ALK portion of read uniquely matchesa target region, EML4 does not uniquely match the reference genome.

DETAILED DESCRIPTION OF THE INVENTION

Before the various embodiments are described, it is to be understoodthat the teachings of this disclosure are not limited to the particularembodiments described, and as such can, of course, vary. It is also tobe understood that the terminology used herein is for the purpose ofdescribing particular embodiments only, and is not intended to belimiting, since the scope of the present teachings will be limited onlyby the appended claims.

The section headings used herein are for organizational purposes onlyand are not to be construed as limiting the subject matter described inany way. While the present teachings are described in conjunction withvarious embodiments, it is not intended that the present teachings belimited to such embodiments. On the contrary, the present teachingsencompass various alternatives, modifications, and equivalents, as willbe appreciated by those of skill in the art.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this disclosure belongs. Although any methods andmaterials similar or equivalent to those described herein can also beused in the practice or testing of the present teachings, some exemplarymethods and materials are now described.

The citation of any publication is for its disclosure prior to thefiling date and should not be construed as an admission that the presentclaims are not entitled to antedate such publication by virtue of priorinvention. Further, the dates of publication provided can be differentfrom the actual publication dates which can need to be independentlyconfirmed.

As will be apparent to those of skill in the art upon reading thisdisclosure, each of the individual embodiments described and illustratedherein has discrete components and features which can be readilyseparated from or combined with the features of any of the other severalembodiments without departing from the scope or spirit of the presentteachings. Any recited method can be carried out in the order of eventsrecited or in any other order which is logically possible. All patentsand publications, including all sequences disclosed within such patentsand publications, referred to herein are expressly incorporated byreference.

Numeric ranges are inclusive of the numbers defining the range. Unlessotherwise indicated, nucleic acids are written left to right in 5′ to 3′orientation; amino acid sequences are written left to right in amino tocarboxy orientation, respectively.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Singleton, et al., DICTIONARYOF MICROBIOLOGY AND MOLECULAR BIOLOGY, 2D ED., John Wiley and Sons, NewYork (1994), and Hale & Markham, THE HARPER COLLINS DICTIONARY OFBIOLOGY, Harper Perennial, N.Y. (1991) provide one of skill with thegeneral meaning of many of the terms used herein. Still, certain termsare defined below for the sake of clarity and ease of reference.

It must be noted that as used herein and in the appended claims, thesingular forms “a”, “an”, and “the” include plural referents unless thecontext clearly dictates otherwise. For example, the term “a primer”refers to one or more primers, i.e., a single primer and multipleprimers. It is further noted that the claims can be drafted to excludeany optional element. As such, this statement is intended to serve asantecedent basis for use of such exclusive terminology as “solely,”“only” and the like in connection with the recitation of claim elements,or use of a “negative” limitation.

The present disclosure provides novel methods for detecting anddetermining the sequence of genomic rearrangements, in particular genefusions, by using primers to target regions that are usually too farapart in a genome, chromosome or gene for an amplification product to beproduced in a normal PCR. This is achieved by conducting a selective PCRcomprising providing a forward primer (or preferably a pool of at least20 forward primers) specific for a first region of interest and areverse primer (or preferably a pool of at least 20 reverse primers)specific for a second region of interest, wherein the first and secondregions of interest are different. In particular, the two regions ofinterest are located at distinct positions in a genome, chromosome orgene such that in a normal sample (i.e. a sample comprising DNAmolecules in which a genomic rearrangement has not occurred) the tworegions are too far apart for an amplification product to be produced ina normal PCR (for example, more than 1 kb apart or on differentchromosomes). For example, the two regions could be two different genes.If a genomic rearrangement event occurs, such as a gene fusion event, itbrings the two regions of interest into proximity with each other suchthat at least one pair of the forward and reverse primers aresufficiently close to each other to produce an amplification product ina PCR, even when the PCR only amplifies small DNA fragments (for examplefragments up to 500 bp in length). The sequence of the amplificationproduct is then determined to confirm the location of the primers usedin the amplification reaction and therefore the presence and location ofthe genomic rearrangement in the sample. Multiple regions and multiplegenomic rearrangements can be targeted in a single reaction. Inpreferred embodiments the primers are designed to target DNA sequencesin the respective regions of interest that are not overlapping. Regionsof interest can be large and span several kilobases, or even megabases,without impacting sequencing cost, as sequencing products will only begenerated in a small number of cases (samples in which a genomicrearrangement has occurred, i.e. fusion-positive samples) as well asfrom a small region of the genome (where the fusion event occurred), nomatter how large the regions of interest covered by the forward andreverse primers are.

The term “primer binding site” in the context of a forward primer or areverse primer indicates that the forward primer or reverse primer 5′tail that is not specific for the region of interest and that, whencopied, provides a sequence to which another primer can bind. Such aprimer binding site is typically 8-30 nucleotides in length, although aprimer binding site can be longer or shorter in some instances.Likewise, the site to which a forward or reverse primer binds istypically at least 8-30 contiguous nucleotides in length.

The term “primer binding site” can also be used in the context of thesequence read of the DMOI to denote the sequence to which thecorresponding forward or reverse primer can bind. For example, in thecontext of analysis of the sequence reads, the forward and reverseprimer binding sites refers to the region-specific sequence in theregion-specific primers. The at least one forward primer binding siteand the at least one reverse primer binding site could be the complementof the region-specific sequence from the corresponding forward orreverse region-specific primer or could have the same sequence as theregion-specific sequence of a corresponding forward or reverseregion-specific primer, depending on the direction of the sequence read.The skilled person is able to take such variations into account whenconducting their analysis.

In any embodiments, all of the primers in a “pool of region-specificforward primers” can bind to the same strand, i.e., the top strand orthe bottom strand, but not both strands, of a region of interest in areference genome, where the term “reference genome”, refers to a genomewhose sequence is at least partially known. The sequences of severalreference genomes, including the human genome, have been deposited atNCBI's GenBank database and other databases. A reference genome can be a“wild type” sequence. Likewise, all of the primers in a “pool ofregion-specific reverse primers” can bind also to the same strand, i.e.,the top strand or the bottom strand, but not both strands, of a regionof interest in the reference genome. The forward primers may bind to thesame or different strand to the reverse primers.

The first region of interest and the second region of interest should beon different chromosomes or sufficiently distanced in the referencegenome so that no amplification products are expected unless there is arearrangement in which the first region of interest and the secondregion of interest become closely linked to one another. In someembodiments, the first and second regions of interest should be ondifferent chromosomes in the reference genome, or distanced by at least10 kb, at least 50 kb, or at least 100 kb if those regions are on thesame chromosome in the reference genome. In embodiments in which cfDNAisolated from blood is analysed, the distance between the first andsecond regions of interest can be much shorter, e.g., at least 1 kb orat least 5 kb, because cfDNA is heavily fragmented (having a median sizethat is well below 1 kb, e.g., in the range of 50 bp to 500 bp) and, assuch, no amplification products would be expected if the first andsecond regions are 1 kb or 5 kb apart.

In any embodiment, all of the forward primers in the forward primer poolmay comprise: i. a sequence at the 3′ end that is complementary to abinding site in a first region of interest and ii. a 5′ tail that is notcomplementary to a sequence in the first region of interest, where thesequences at the 3′ end of the forward primer are complementary todifferent sites in the first region of interest, and all of the reverseprimers in the reverse primer pool may comprise: i. a sequence at the 3′end that is complementary to a binding site in the second region ofinterest and ii. a 5′ tail that is not complementary to a sequence inthe second region of interest, where the sequences at the 3′ end of thereverse primer are complementary to different sites in the second regionof interest.

In some embodiments, methods disclosed herein are used to exponentiallyamplify small stretches of DNA, for example DNA molecules that are up to500 nucleotides in length. This be achieved in a number of ways.Usually, the DNA will be fragmented prior to carrying out the method.Fragmentation may have occurred already, for example in the body of apatient such that the sample obtained already contains fragmented DNA.Alternatively, a step of DNA fragmentation may be included in the methoditself.

In one embodiment, there is provided a method of detecting a genomicrearrangement event, comprising:

-   -   a. contacting a sample comprising a DNA molecule of interest        (DMOI) with one or more forward primers and one or more reverse        primers, wherein the or each of the forward primers is specific        for a first region of interest, and the or each of the reverse        primers is specific for a second, different, region of interest;        and    -   b. conducting PCR.

In one embodiment there is provided a method of detecting a genomicrearrangement event, comprising:

-   -   a. contacting a sample comprising a DNA molecule of interest        (DMOI) with a pool of forward primers specific for a first        region of interest and a pool reverse primers specific for a        second, different, region of interest;    -   b. conducting PCR;    -   c. determining the sequence of the amplification product of the        PCR.

In a preferred embodiment, the method comprises all of the followingsteps:

-   -   a. contacting a sample comprising DNA molecules of interest        (DMOIs) with a pool of at least 20 region-specific forward        primers and a pool of at least 20 region-specific reverse        primers, wherein:        -   i. each of the forward primers in the forward primer pool            comprises a sequence specific for a first region of interest            and a first primer binding site; and        -   ii. each of the reverse primers in the reverse primer pool            comprises a sequence specific for a second, different,            region of interest and a second primer binding site;    -   b. amplifying the DMOIs using the region-specific primers;    -   c. conducting PCR using forward primers that target the first        primer binding site and reverse primers that target the second        primer binding site;    -   d. sequencing the PCR amplification product to provide a library        of sequence reads, wherein the sequence reads comprise the        sequence of a forward and/or reverse primer used in step (a);    -   e. using the sequence reads provided in step (d) to determine        the sequence of a genomic fusion between the first and second        regions of interest.

In some embodiments, the first primer binding site is the same in eachof the at least 20 region-specific forward primers and the second primerbinding site is the same in each of the at least 20 region-specificreverse primers. The first and second primer binding sites may bedifferent from each other. The first and second primer binding sites mayalso be universal primer bindings sites.

The combination of the selective PCR with the step of sequencing allowsnon-specific and off-target amplification products to be ruled out andgenuine genomic rearrangement events to be identified.

The genomic rearrangement event can be an unknown genomic rearrangementevent, since no prior knowledge of the exact nature of the genomicrearrangement is needed for the methods disclosed herein to be able todetect and characterise the genomic rearrangement.

The first and second regions of interest may be in different genes. Thedifferent regions of interest or different genes are located atdifferent places in a given genome when a genomic rearrangement has notoccurred, and may even be located on different chromosomes when agenomic rearrangement event has not occurred.

The forward primers and reverse primers are present in a pool of forwardand reverse primers. The forward primers in the pool of forward primersare specific to a first region of interest (such as a first gene), andthe reverse primers in the pool of reverse primers are specific to asecond, different, region of interest (such as a second gene). Given thenormal location of the first and second regions or first and secondgenes, when PCR is conducted, a PCR product is produced when a genomicrearrangement event has occurred. Therefore, the PCR can be referred toas a selective PCR.

In some embodiments, the first and second regions of interest arelocated on the same chromosome but are located such that no PCRamplification product is generated in the absence of a genomicrearrangement event. In some embodiments, the first and second regionsof interest are located on the same chromosome but are separated by anumber of base pairs that prevents a PCR occurring under normalconditions. For example, the first and second regions may be at least100 base pairs, or at least 250 base pairs, or at least 500 base pairs,preferably at least about 1000 base pairs apart.

In some embodiments, each of the first and second region of interest areat least 1 kilobase in length and the first and second regions areseparated by at least 1 kilobase when no genomic rearrangement event hasoccurred.

Given that the sequence of an amplification product is indicative ofgenomic rearrangement event, the methods disclosed herein comprisedetermining the sequence of a PCR amplification product. In particular,the relevant PCR amplification product to detect is the amplificationproduct resulting from the PCR of one or more pairs of forward andreverse primers targeting the two regions of interest. Determining thesequence of a PCR amplification product allows the definitive detectionof genomic rearrangements as non-specific or off-target amplificationcan be discounted. Therefore, it is the sequence of the DMOI that isindicative of the presence of a genomic rearrangement event between thefirst and second region of interest.

As noted above, the forward and reverse primers are present in pools ofprimers. In such embodiments, the primers preferably tile the regions ofinterest. The forward primers tile the first region of interest and thereverse primers tile the second region of interest. Tiling the regionsinvolves providing primers that target different stretches of DNA in therespective regions of interest. Primers in the same pool of forward orreverse primers target different stretches of DNA in the respectiveregions of interest.

In a preferred embodiment, the DNA sequences in the regions of interestthat are targeted by the pool of forward or reverse primers do notoverlap. In other words, the region-specific tract of each member of aprimer pool is different and does not overlap with the region-specifictract of any other member of the primer pool. However, in a given poolmultiple copies of the same primer are of course possible, and indeedare preferred. In some embodiments, the pool of primers (either forwardor reverse) comprises a set of primers each targeting a different,non-overlapping, DNA tract of the region of interest, but the poolcomprises multiple copies of each member of the set.

As noted above, when a pool of primers tile a region of interest, thetiling is such that the primers do not target overlapping DNA stretchesof the region of interest, and more preferably the primers tile atintervals, with gaps between the stretches of DNA in the region ofinterest being targeted by the primers. In some embodiments, the forwardand/or reverse primers tile the first and/or second region of interestat intervals of from about 10 to about 2000 base pairs, from about 10 toabout 1000 base pairs, from about 10 to about 500 base pairs, from about10 to about 250 base pairs, from about 10 to about 150 base pairs, fromabout 25 to about 125 base pairs, from about 50 to about 100 base pairs,or from about 60 to about 90 base pairs. An appropriate frequency fortiling the regions of interest at certain intervals can be determined bythe skilled person. However, tiling at intervals of from about 60 toabout 90 base pairs (or up to about 100 base pairs) can be particularlyuseful for targeting DNA that is approximately 150-160 base pairs inlength, such as circulating tumour DNA.

Similarly, the size of the gaps between sequences of the regions ofinterest targeted by the primers in a primer pool may be from 1 to 150bases, from 10 to 100 bases, from 25 to 100 bases or preferably from 50to 100 bases. Such intervals are particularly useful for ctDNA, whichare approximately 160 base pairs in length. Hence intervals of 50 to 100bases (e.g. 75 bases) helps to ensure that a ctDNA derived from a regionof interest will be targeted by at least one of the forward or reverseprimers in the pool.

In some embodiments the pool of forward primers comprises at least 20,at least 50 or at least 100 different forward primers. Preferably thedifferent primers target stretches of DNA in the region of interest thatare not overlapping with each other. Multiple copies of each primer maybe present in the pool.

In some embodiments, the pool of reverse primers, wherein the pool ofreverse primers comprises at least 20, at least 50 or at least 100different reverse primers. Preferably the different primers targetstretches of DNA in the region of interest that are not overlapping witheach other. Multiple copies of each primer may be present in the pool.

In one embodiment, the method comprises contacting a sample containingthe DMOI with a pool of forward and a pool of reverse primers, whereinthe pool of forward primers comprises at least 20, at least 50 or atleast 100 different forward primers and a pool of reverse primerscomprising at least 20, at least 50 or at least 100 different reverseprimers. Preferably the different primers target stretches of DNA in theregion of interest that are not overlapping. Multiple copies of eachprimer may be present in the pool.

Preferably at least 100 forward and reverse primers are used, althoughthe total number could be higher (for example at least 500 or at least1000 forward and reverse primers) to enable larger and more regions ofinterest to be targeted in a single reaction.

The pool of forward primers may comprise at least 20 different forwardprimers and the pool of reverse primers may comprise at least 20different reverse primers.

In some embodiments, the methods comprise contacting a sample comprisinga DMOI with a pool of at least 20 different forward primers (for exampleat least 100 different forward primers) specific for a first region ofinterest and a pool of at least 20 different reverse primers (forexample at least 100 different reverse primers) specific for a secondregion of interest, wherein each of the first and second region ofinterest are at least 1 kilobase in length and the first and secondregions are separated by at least 1 kilobase when no genomicrearrangement event has occurred, and further wherein the primers tiletheir respective regions of interest at intervals of from 50 to 100bases. The first and second regions may be different genes.

The primers may be of any suitable length, for example they may be from5 to 50 base pairs in length, for example from 10 to 40 base pairs inlength, or from 18 to 35 base pairs in length. The skilled person isfamiliar with the use of primers in PCR and would be able to determineappropriate size of a primer.

To assist in the analysis, the sequences of all the PCR primers used totarget the first and second regions of interest (the selective PCRprimers) are known.

The methods disclosed herein are useful for detecting the sequence ofmultiple types of genomic rearrangement events. Of particularsignificance are gene fusion events, which may be associated with adisease or condition. In some embodiments, the method comprisesdetermining the presence or absence of a genomic rearrangement that isknown or is suspected to be associated with a disease or disorder. Insome embodiments, the methods determine the presence or absence of agene fusion event that is known, or is suspect to be, associated withcancer. One advantage of the methods and kits disclosed herein is thatthey can detect any gene fusion even between two genes, without the needfor prior knowledge of the precise fusion event that has occurred. Themethods and kits disclosed herein can also significantly reduce theamount of sequencing required since the gene rearrangement isselectively enriched. The step of sequencing the amplification productprovides information on the precise fusion or genomic rearrangementevent that has occurred and ensures that only a true gene rearrangementis detected/reported.

When a genomic rearrangement event has occurred, at least one pair offorward and reverse primers anneals to the DMOI within 500 base pairsfrom each other, or within 400 base pairs from each other, or within 300base pairs from each other, or within 200 base pairs from each other, orwithin 175 base pairs from each other.

The primers themselves may be gene specific primers, with each of theforward primers being specific for a first gene (i.e. a first region ofinterest) and each of the reverse primers being specific for a second,different, gene (i.e. a second region of interest). The primers comprisea region-specific sequence that enables the primer to anneal to a regionof interest.

The primers used in the selective PCR may also comprise other features.For example, the primers may comprise sequencing adaptors or partialsequencing adaptors that allow the amplification product (if one isproduced) to be sequenced without the need for ligating on adaptorsseparately (see, for example, Weaver et al., Nat Genet., 2014; 46:837-843 and Forshew et al., Sci Transl Med., 2012; 4). Adaptors aremoieties that allow sequencing of DNA, in particular usinghigh-throughput sequencing (i.e. next generation sequencing, NGS), andthey are familiar to the skilled person. Most commonly, and potentiallyin addition to sequencing adaptors, the region-specific primers maycomprise one or more primer binding sites, in particular universalprimer binding sites (UPS). The incorporation of the UPSs into theamplification product allows the amplification product of a firstreaction to be targeted again with a further pair of primers that arespecific for the UPS. The primers used in the second PCR may themselvescomprise the sequencing adaptors or partial sequencing adaptors thatallow the amplification product of the second PCR to be sequenced usingNGS. When two or more PCR reactions are used, the methods may comprise astep of purification of the amplicons from the first PCR beforeconducting the second PCR. The first PCR is a multiplex PCR, whereas thesecond PCR is not multiplex PCR, since only primers specific for theuniversal primer sites introduced in the first round of PCR are used inthe second PCR. However, this second PCR step may still act toselectively amplify DMOIs that represent genomic rearrangement events,since only those DMOIs will have been amplified in the first PCR (apartfrom some possible non-specific PCR amplification), although in itselfthis is not a selective amplification step.

PCR can be used to introduce a number of features into the DMOI. Forexample, PCR may incorporate a universal primer binding site (orsequencing adapter, as discussed above), a molecular barcode and/orindex sequence into the PCR product. Index sequences may be a sequencethat identifies the DNA as deriving from a particular sample or patient,and so may be a patient or sample-specific index sequence. A molecularbarcode may be used to identify different starting DMOI in a givensample, and so may be DMOI-specific molecular barcodes. Typically, amolecular barcode and universal primer binding sites may be introducedin the first PCR using the region-specific primers. An index sequencemay typically be incorporated in a second PCR using primers that targetthe universal primer binding sites introduced in the first PCR.

Sequencing adaptors may be incorporated in the first or second PCR.Alternatively, partial sequencing adaptors may be incorporated into thefirst PCR and partial sequencing adaptors may be incorporated in asecond PCR, subsequently completing the sequencing adaptors. Theuniversal primer binding sites that are incorporated in the method maymake up a first portion of a sequencing adaptor that is completed with asecond portion of the sequencing adaptor when a subsequent PCR takesplace. Therefore, the sequencing adaptors may comprise the universalprimer binding sites. Such an embodiment is described in FIG. 7.However, the precise method used to incorporate the sequencing adaptorsis not crucial and the incorporation of sequencing adaptors to allow theamplification product to be sequenced is familiar to the skilled personand appropriate PCR based methods may be used. Preferably the sequencingadaptors are not incorporated or attached by ligation.

The step of “amplifying the DMOIs using the region-specific primers” maybe achieved in a number of ways. A key aspect is to incorporate thefirst and second universal primer binding sites from the region-specificprimers into a reaction product to allow a subsequent PCR targetingthose first and second primer binding sites.

For example, amplification of the DMOIs using the region-specificprimers may comprise extension reactions (as in the sequential methods,discussed below) and/or PCR. There are therefore several appropriateworkflows for the methods disclosed herein. In some embodiments, thesample comprising the DMOIs is contacted with the forward and reverseregion-specific primers and PCR conducted with forward and reverseprimers present. The PCR incorporates universal primer binding sitesinto the PCR product, which is then targeted in a second PCR usinguniversal primers to incorporate sequencing adaptors. For example, asfollows:

-   -   a. contacting a sample comprising DNA molecules of interest        (DMOIs) with a pool of at least 20 region-specific forward        primers and a pool of at least 20 region-specific reverse        primers, wherein:        -   i. each of the forward primers in the forward primer pool            comprises a sequence specific for a first region of interest            and a first primer binding site; and        -   ii. each of the reverse primers in the reverse primer pool            comprises a sequence specific for a second, different,            region of interest and a second primer binding site;    -   b. amplifying the DMOIs using the region-specific primers by PCR        to incorporate the first and second primer binding sites into        the amplification product;    -   c. conducting a further PCR using forward primers that target        the first primer binding site and reverse primers that target        the second primer binding site to incorporate sequencing        adaptors into the amplification product;    -   d. sequencing the PCR amplification product to provide a library        of sequence reads, wherein the sequence reads comprise the        sequence of a forward and/or reverse primer used in step (a);    -   e. using the sequence reads provided in step (d) to determine        the sequence of a genomic fusion between the first and second        regions of interest.

In some embodiments, the first primer binding site is the same in eachof the at least 20 region-specific forward primers and the second primerbinding site is the same in each of the at least 20 region-specificreverse primers. The first and second primer binding sites may bedifferent from each other. The first and second primer binding sites mayalso be universal primer bindings sites.

It is possible to even conduct the entire method using a single PCR, inwhich the incorporation of universal primer binding sites is notrequired, for example as follows:

-   -   a. contacting a sample comprising DNA molecules of interest        (DMOIs) with a pool of at least 20 region-specific forward        primers and a pool of at least 20 region-specific reverse        primers, wherein:        -   i. each of the forward primers in the forward primer pool            comprises a sequence specific for a first region of interest            and a first primer binding site and a sequencing adaptor;            and        -   ii. each of the reverse primers in the reverse primer pool            comprises a sequence specific for a second, different,            region of interest and a second primer binding site, and a            sequencing adaptor;    -   b. amplifying by PCR the DMOIs using the region-specific        primers, wherein the sequencing adaptors are incorporated into        the amplification product by the PCR;    -   c. sequencing the PCR amplification product to provide a library        of sequence reads, wherein the sequence reads comprise the        sequence of a forward and/or reverse primer used in step (a);    -   d. using the sequence reads provided in step (d) to determine        the sequence of the genomic fusion between the first and second        regions of interest.

In some embodiments, the first primer binding site is the same in eachof the at least 20 region-specific forward primers and the second primerbinding site is the same in each of the at least 20 region-specificreverse primers. The first and second primer binding sites may bedifferent from each other. The first and second primer binding sites mayalso be universal primer binding sites. The forward and reverse primersmay also comprise a molecular barcode and/or an index sequence.

Alternatively, the method can be conducted sequentially, which requiresthe use of one or more extension reactions followed by one or more PCRamplifications. For example, the sample comprising the DMOIs may becontacted with the forward region-specific primers, and one or moreextension reactions conducted to extend the annealed primers along theDMOI. The forward primers comprise a region-specific sequence and afirst primer binding sequence for use in a subsequent PCR. The one ormore extension reactions incorporate the first primer binding site intothe daughter molecules and also amplifies a first strand of the DMOI.Subsequently, one or more extension reactions are conducted using thereverse region-specific primers to incorporate a second primer bindingsite into the daughter molecules and also amplify the second strand ofthe DMOI. A PCR can then be conducted using primers that target thefirst and second primer binding sites incorporated by the one or moreextension reactions. That PCR may also incorporate sequencing adaptorsto allow the reaction product to be sequenced and the genomicrearrangement to be characterised. Note that the method may compriseconducting a single extension reaction for the forward primer and/or asingle extension reaction for the reverse primer. However, in preferredembodiments, the method comprises a plurality of extension reactions forboth the forward and reverse primers. Conducting a plurality ofextension reactions means:

-   -   a) contacting the DMOIs with the forward region-specific        primers;    -   b) allowing the forward region-specific primers to anneal to the        DMOIs;    -   c) conducting an extension reaction to extend annealed forward        region-specific primers along the DMOI;    -   d) denaturing the resulting double-stranded DNA molecule; and    -   e) repeating steps (b) to (d) a plurality of times.

The same steps can be undertaken for the reverse region-specificprimers.

The skilled person will be aware of suitable reaction conditions toallow the components of the reaction to anneal, extend or denature, asappropriate.

Accordingly, in one embodiment, the method comprises:

-   -   a. contacting the sample comprising the DMOIs with the pool of        at least 20 region-specific forward primers;    -   b. conducting one or more extension reactions to extend annealed        forward primers along the DMOIs and to introduce the first        primer binding site into the extension product;    -   c. optionally removing or deactivating the region-specific        forward primers from the reaction mixture;    -   d. contacting a sample obtained in step (c) with the pool of at        least 20 region-specific reverse primers;    -   e. conducting one or more extension reactions to extend annealed        reverse primers along the DMOIs and to introduce the second        primer binding site into the extension product;    -   f. optionally removing or deactivating the region-specific        reverse primers from the reaction mixture; and    -   g. conducting PCR using forward primers that target the first        priming binding site introduced in step (b) and reverse primers        that target the second priming binding site introduced in        step (e) to amplify a genomic fusion event between the first and        second regions of interest.

The forward primers used in step (a) comprise a sequence specific for afirst region of interest and a first primer binding site, and in someembodiments the first primer binding site is the same in each of theforward primers in the forward primer pool. The reverse primers used instep (d) comprise a sequence specific for a second, different, region ofinterest and a second primer binding site, and in some embodiments thesecond primer binding site is the same in each of the reverse primers inthe reverse primer pool.

The one or more extension reactions of steps (b) and (e) incorporate thefirst and second primer sites present in the forward and reverseregion-specific primers into the extension products. The first andsecond primer sites introduced in steps (b) and (e) therefore act as aforward and reverse primer sites for the primer pair used in step (g) toamplify a genomic rearrangement between the first and second regions ofinterest. A fusion specific exponential reaction product will only beproduced when a genomic rearrangement has occurred to situate the firstand second primers sites sufficiently close together in a singlemolecule, hence allowing a genomic rearrangement between the first andsecond regions to be identified.

In any methods disclosed herein that comprise the incorporation of firstand second universal primer binding sites into the amplificationproduct, the first and second universal primer binding sites aredifferent from each other.

It is also noted that the use of “forward” and “reverse” as usedthroughout is simply for the purposes of orientation and explanation.The skilled person will be aware that the “forward” and “reverse”designation could be switched, without affecting the method in any way.

In another embodiment, the method may comprise:

-   -   a. contacting the sample comprising the DMOIs with the pool of        at least 20 forward primers;    -   b. conducting one or more extension reactions to extend annealed        forward primers along the DMOIs and to introduce the first        primer binding site into the extension product;    -   c. optionally removing or deactivating the forward primers from        the reaction mixture;    -   d. contacting a sample obtained in step (c) with:        -   i. the pool of at least 20 reverse primers; and        -   ii. primers targeting the first primer binding site added in            step (a); and    -   e. conducting PCR to amplify a genomic fusion event between the        first and second regions of interest.

Again, the forward primers used in step (a) comprise a sequence specificfor a first region of interest and a first primer binding site, and insome embodiments the first primer binding site is the same in each ofthe forward primers in the forward primer pool. The reverse primers usedin step (d)(i) comprise a sequence specific for a second, different,region of interest and a second primer binding site, and in someembodiments the second primer binding site is the same in each of thereverse primers in the reverse primer pool.

Similarly, the one or more extension reactions of step (b) incorporatesthe first primer site into extension products arising from DMOIs thatcomprise the first region of interest. In step (d), fusion-specificexponential PCR will occur where the second region of interest issufficiently close to the first priming site introduced in the one ormore extension reactions of step (b). The first primer site introducedin step (b) and the second region of interest targeting in step (d)(i)therefore act as a forward and reverse primer sites for the primer pairused in step (e) to amplify a genomic rearrangement between the firstand second regions of interest.

In the two sequential methods described above using one or moreextension reactions, it is still possible to use primer pools thattarget multiple first and second regions of interest, including usingpanels of primers that tile the regions of interest. Also, the primingsites incorporated into the extension and/or PCR products can beuniversal primer binding sites.

Methods of removal or deactivation of primers are well known to personsof skill in the art. For example, the step of removing the primers maycomprise removal by size selection, size exclusion columns, gelextraction, or silica membrane columns. The step of deactivating theprimers may comprise enzymatic digestion of the primers. The steps ofremoval and/or deactivation of primers are optional. However, they maybe present in preferred embodiments.

In both example sequential methods outlined above, the method furthercomprises a step of sequencing the PCR product. This may be achieved byconducting a further PCR amplification reaction, for example tointroduce sequencing adaptors into the DMOI, prior to sequencing theDMOIs. The sequencing adaptors allow the amplified DMOIs to be sequencedusing next generation sequencing techniques. Alternatively, the DMOIscan be prepared for sequencing using a single PCR by using primers thatincorporate sequencing adaptors into the DMOIs during the first PCR.This can be achieved by, for example, using primers in the first PCRthat also incorporate the sequencing adaptors, avoiding the need for anadditional PCR.

In one embodiment, the method comprises

-   -   a. contacting the sample comprising the DMOIs with the pool of        at least 20 forward primers;    -   b. conducting one or more extension reactions to extend annealed        forward primers along the DMOIs and to introduce the first        primer binding site into the extension product;    -   c. optionally removing or deactivating the forward primers from        the reaction mixture;    -   d. contacting a sample obtained in step (c) with the pool of at        least 20 reverse primers;    -   e. conducting one or more extension reactions to extend annealed        reverse primers along the DMOIs and to introduce the second        primer binding site into the extension product;    -   f. optionally removing or deactivating the reverse primers from        the reaction mixture; and    -   g. conducting PCR using forward primers that target the first        priming binding site introduced in step (b) and reverse primers        that target the second priming binding site introduced in        step (e) to amplify a genomic fusion event between the first and        second regions of interest, wherein either:        -   i. the primers used in step (g) comprise sequencing            adaptors; or        -   ii. the method further comprises a second PCR amplification            reaction to incorporate sequencing adaptors into the            reaction product; and    -   h. sequencing the reaction product of step (g).

In another embodiment the method comprises:

-   -   a. contacting the sample comprising the DMOIs with the pool of        at least 20 forward primers;    -   b. conducting one or more extension reactions to extend annealed        forward primers along the DMOIs and to introduce the first        primer binding site into the extension product;    -   c. optionally removing or deactivating the forward primers from        the reaction mixture;    -   d. contacting a sample obtained in step (c) with:        -   i. the pool of at least 20 reverse primers; and        -   ii. primers targeting the first primer binding site added in            step (a); and    -   e. conducting PCR to amplify a genomic fusion event between the        first and second regions of interest, wherein either:        -   i. the primers used in step (d) comprise sequencing            adaptors; or        -   ii. the method further comprises a second PCR amplification            reaction to incorporate sequencing adaptors; and    -   f. sequencing the reaction product of step (e).

As noted, the methods disclosed herein comprise sequencing theamplification product from a PCR (either the first PCR, or a second orsubsequent PCR if a second or subsequent PCR is used). Hence in someembodiments, the step of determining the presence or absence of agenomic rearrangement event comprises determining the sequence of theDMOI (or rather, the amplification product, which corresponds to theportion of the DMOI that has been amplified). The sequencing can behigh-throughput sequencing (next generation sequencing). In someembodiments, the high-throughput sequencing is selected from the groupconsisting of sequence-by-synthesis (SBS), sequencing-by-ligation (SBL)and long-read sequencing (LRS). In some embodiments, thesequencing-by-synthesis is selected from the group consisting of cyclicreversible termination SBS and single-nucleotide addition SBS. In someembodiments, the long-read sequencing is selected from the groupconsisting of single-molecule LRS and synthetic long-read LRS. Specificmethods include platforms such as Illumina (e.g. Mi-Seq or Hi-Seq),Oxford Nanopore, Pacific Biosciences, Roche 454, Ion torrent (Proton/PGMsequencing), SOLiD sequencing etc.

Prior to sequencing of the amplicons generated from the various PCRreactions used in the methods, the methods may comprise a step ofpurification of the amplicons. Methods for purification are known to theskilled person and commercial kits are available for this purpose (forexample SPRISelect from Beckman Coulter). The same techniques can beused to purify the amplicons between PCR reactions.

Other steps that may be undertaken prior to sequencing may include sizeselection. For example, the methods may comprise a step of selectingamplicons having a size of between 100 and 500 base pairs (for examplebetween 200 and 350 base pairs). Alternatively, or additionally, theamplicons may also be quantified prior to sequencing.

Some embodiments also provide a method for determining the sequence of aDNA molecule of interest (DMOI) or a portion thereof (said portioncomprising a junction of the genomic rearrangement or the junction of agene fusion), the method comprising:

-   -   a. providing a sample obtained from a patient, wherein the        sample comprises a DMOI;    -   b. optionally processing the sample;    -   c. conducting a first PCR using a pool of at least 20 forward        and a pool of at least 20 reverse primers disclosed herein,        wherein the first PCR incorporates universal primer binding        sites;    -   d. conducting a second PCR using at least one pair of forward        and reverse primers that are specific for the universal primer        binding sites incorporated in the first PCR, wherein the second        PCR incorporates sequencing adaptors into the amplification        product of the PCR; and    -   e. determining the sequence of the DMOI or portion thereof.

The disclosure also provides a method for determining the sequence of aDNA molecule of interest (DMOI) or a portion thereof (said portioncomprising a junction of the genomic rearrangement or the junction of agene fusion), the method comprising:

-   -   a. providing an amplicon prepared by a method disclosed herein        (such as method steps (a) to (d) above or any method of        detecting genomic fusions disclosed herein); and    -   b. determining the sequence of the DMOI or portion thereof.

The above methods can be used to characterise the genomic rearrangementor gene fusion by determining its sequence.

Determining the sequence of the tagged and enriched DMOI can be carriedout according to any suitable method known to the skilled person.However, given the benefits of such approaches, next-generationsequencing (NGS) methods are preferred. Next-generation sequencing isalso referred to as high-throughput sequencing and massively-parallelsequencing in the art and is known and understood by the skilled person.A review of next-generation sequencing techniques is provided in Goodwinet al., “Coming of age: ten years of next-generation sequencetechnologies”, 2016, Nature Reviews, 17:333-351.

Methods disclosed herein may comprise paired-end sequencing, so as toprovide the complete sequence of the DMOI in the sequence read, even ifthe sequence read length is shorter than the length of the DMOIs.Paired-end sequence reads are known to the skilled person. Preferably,the sequence reads include the sequence of both the forward and reverseregion-specific primers used in the selective amplification step. Sincethe sequencing adaptors are incorporated using primers that incorporatethe sequencing adaptors upstream (i.e. 5′) of the forward and reverseregion-specific primers (in particular upstream of the first and seconduniversal primer binding sites incorporated in the first amplificationstep), it will always be possible to provide sequence reads thatcomprise the sequence of the forward and reverse region-specific primers(or, at the very least, the component of the forward and reverseregion-specific primers that anneals to the first or second region ofinterest, respectively).

In some embodiments comprising NGS, the method may further compriselocalising amplified DMOIs to discrete sites. The discrete sites maycomprise a solid or semi-solid substrate. The method may also comprisehybridising or immobilising the DMOIs to the solid or semi-solidsubstrate and clonally amplifying the localised DMOIs.

In one embodiment, there is provided a method comprising:

-   -   a. contacting a sample comprising a DNA molecule of interest        (DMOI) with a pool of primers comprising at least 20 forward        primers and at least 20 reverse primers, wherein the forward        primers are specific for a first region of interest and the        reverse primers are specific for a second, different, region of        interest, and wherein the primers comprise a region-specific        sequence and a sequencing adaptor;    -   b. conducting PCR; and    -   c. sequencing the amplification product or products of the PCR        using high-throughput sequencing.

It is also possible to use multiple pools of primers. For example, inone embodiment, there is provided a method comprising:

-   -   a. contacting a sample comprising a DNA molecule of interest        (DMOI) with at least two pools of primers, wherein each pool of        primers comprises a set of at least 20 forward primers and at        least 20 reverse primers, wherein each set of forward primers        are specific for a first region of interest and each set of        reverse primers are specific for a second, different, region of        interest, and wherein the primers comprise a region-specific        sequence and a sequencing adaptor;    -   b. conducting PCR; and    -   c. sequencing the amplification product or products of the PCR        using high-throughput sequencing.

In one embodiment, there is provided a method comprising:

-   -   a. contacting a sample comprising a DNA molecule of interest        (DMOI) with a pool of primers comprising at least 20 forward        primers and at least 20 reverse primers, wherein forward primers        are specific for a first region of interest and the reverse        primers are specific for a second, different, region of        interest, and wherein the primers comprise a region-specific        sequence and a universal primer binding site;    -   b. conducting PCR;    -   c. contacting the amplification product from step (b) with one        or more sets of forward and reverse primers that are specific        for the universal primer binding sites introduced by the first        PCR, wherein the primers comprise sequencing adaptors;    -   d. conducting a second PCR; and    -   e. sequencing the amplification product or products of the PCR        using high-throughput sequencing.

In one embodiment, there is provided a method comprising:

-   -   a. contacting a DNA molecule of interest (DMOI) with at least        two pools of primers, wherein each pool of primers comprises a        set of at least 20 forward primers a set of at least 20 reverse        primers, wherein each set of forward primers are specific for a        first region of interest and each set of reverse primers are        specific for a second, different, region of interest, and        wherein the primers comprise a region-specific sequence and a        universal primer binding site;    -   b. conducting PCR;    -   c. contacting the amplification product from step b. with one or        more sets of forward and reverse primers that are specific for        the universal primer binding sites introduced by the first PCR,        wherein the primers comprise sequencing adaptors;    -   d. conducting a second PCR; and    -   e. sequencing the DMOI using high-throughput sequencing.

In some embodiments it is advantageous to include a positive control.This is to ensure the assay has been carried out correctly to avoidfalse negative results. For example, the method may comprise includingone or more pairs of primers that are specific to a genetic alterationthat is different to the genomic rearrangement targeted by the pool offorward and reverse primers. For example, the “genetic alteration” maybe a single nucleotide polymorphism (SNP), INDEL, single nucleotidevariants (mutations), substitutions, duplications, insertions,deletions, gene copy number variations, and structural variants,including inversions and translocations, or another genetic alterationof interest. The additional primer pair or primer pairs target a regionknown to contain the genetic alteration of interest. The “geneticalteration” targeted by the “control primers” is distinct from the“genomic rearrangement event” targeted by the “genomic rearrangementprimers”.

Additionally, the use of one or more pairs of primers that are specificto a genetic alteration that is different to the genomic rearrangementtargeted by the pool of forward and reverse primers may allow furthercharacterisation of, for example, the cancer being diagnosed using theassay. For example, the method could be combined with primers that areselective for specific cancer mutations, such as point mutations. Suchembodiments would not include the additional primers only as positivecontrols but also to provide additional information about the nature ofthe cancer.

If the method comprises one PCR (for example when the region-specificprimers already include sequencing adaptors for sequencing theamplification products), the additional primer pair or primer pairstargeting a different genetic alteration are generally included in thefirst reaction mixture such that a single PCR can amplify the genomicrearrangement DMOI (if present) and the additional genetic alteration.If the method comprises two PCR reactions (for example when the forwardand reverse region-specific primers include UPSs, and a second PCRintroduces the sequencing adaptors into the amplification product fromthe second PCR), the additional primer pair or primer pairs targeting adifferent genetic alteration may be included in either the first PCRmixture or the second PCR mixture, but preferably will be included inthe first PCR mixture.

The control primer pair(s) target the same region of interest. Hencethey are different to the forward and reverse primers that targetdifferent regions of interest. As such, an amplification product shouldoccur from the control primer pair(s) regardless of the presence orabsence of a genomic rearrangement event. It is also possible that onemember of the control primer pair or pairs is contained within the poolof forward or reverse primers targeting the genomic rearrangement event.Hence the region containing the genetic alteration may be within or mayoverlap or overlay with the first or second region of interest targetedby the forward and reverse tiling primers. In other embodiments, thecontrol primer pair(s) target a region or gene that is different to theregions or genes targeted by the genomic rearrangement primers.

To take into account the need to detect an amplification product arisingfrom a genomic rearrangement event as being distinct from anamplification product arising from a genetic alteration, the genomicrearrangement primers may be present at a higher concentration than thecontrol primers. For example, each of the control primers may be presentat a concentration that is 50% or lower than that of the genomicrearrangement primers.

Multiple “control” primer pairs can be included in the same reaction,with each primer pair targeting a different genetic alteration. Forexample, the set of control primers may comprise up to 5, up to 10 or upto 20 or more primer pairs, each primer pair targeting a differentgenetic alteration. Of course, multiple copies of each primer pair willgenerally be added to the reaction mixture to ensure the PCR takes placecorrectly.

As with the selective PCR, the control primer pairs may incorporateadaptor sequences (also referred to herein as sequencing adaptors) intothe amplicon from the control PCR. Alternatively, the control primerpairs may incorporate universal primer binding sites into the ampliconsfrom the control PCR, and these are targeted using a further PCR usingprimers specific for the universal primer binding sites and themselvesincorporating the sequencing adaptors.

In embodiments comprising the use of control primers, the method mayinclude detecting the presence or absence of the genetic alteration.This may comprise sequencing a PCR amplification product.

In one embodiment, the methods comprise:

-   -   a. providing a sample comprising a DMOI;    -   b. optionally extracting the DMOI from the sample;    -   c. conducting a selective PCR on the sample (or extracted DMOI)        using a pool of at least 20 forward primers specific for a first        gene and a pool of at least 20 reverse primers specific for a        second gene, wherein the pool of forward primers tiles the first        gene (or a region thereof) and the pool of reverse primers tile        the second gene (or a region thereof), with space of between 50        and 100 nucleotide bases between adjacent primers in the pools,        wherein the selective PCR is performed concurrently with a        control PCR using one or more primer pairs specific to a genetic        alteration (such as a SNP), and further wherein the selective        and control PCR reactions incorporate universal primer binding        sites into the amplicons that are generated;    -   d. optionally purifying the amplicons from step (c);    -   e. conducting a further PCR using primers specific for the        universal primer binding sites incorporated in step (c);    -   f. optionally purifying the amplicons from step (e);    -   g. sequencing the amplification product from step (f); and

h. determining the presence or absence of a genomic rearrangement. If agenomic rearrangement is present, the nature of the arrangement can bedetermined according to the sequence of the amplicons.

Importantly, further selective enrichment steps (beyond the selectivePCR step) are not necessary in the methods disclosed herein, for exampleusing hybridisation probes, since a step of enrichment is inherentlyincorporated into the selective PCR. Therefore, in preferredembodiments, the methods do not comprise enrichment (for exampleenrichment of any sample, DNA or amplicon) by hybridisation, for exampleenrichment using hybridisation probes.

The DMOI and Genomic Rearrangements to be Detected

The DNA molecules of interest (DMOIs) may be single stranded or doublestranded, but they are preferably double stranded. In some embodiments,the DMOI is DNA obtained by reverse transcription of RNA. Hence in someembodiments, the method comprises converting an RNA sequence to a DNAsequence to obtain the DMOI. Converting an RNA sequence to a DNAsequence may be carried out using a reverse transcriptase.

The DMOI may be cell-free DNA (cfDNA). In a preferred embodiment, theDMOI is a circulating tumour DNA (ctDNA).

The DMOI are preferably fragmented. The methods may comprise a step offragmenting the DNA. Alternatively (and most commonly), the DNA mayalready be fragmented in the sample that is obtained from a patient.

In some embodiments, the DMOI are up to 500 base pairs in length.

When the DMOIs are ctDNA molecules, the ctDNA may be from a cancerselected from the group consisting of acute lymphoblastic leukemia,acute or chronic lymphocyctic or granulocytic tumour, acute myeloidleukemia, acute promyelocytic leukemia, adenocarcinoma, adenoma, adrenalcancer, basal cell carcinoma, bone cancer, brain cancer, breast cancer,bronchi cancer, cervical dysplasia, chronic myelogenous leukemia, coloncancer, epidermoid carcinoma, Ewing's sarcoma, gallbladder cancer,gallstone tumour, giant cell tumour, glioblastoma multiforma, hairy-celltumour, head cancer, hyperplasia, hyperplastic corneal nerve tumour, insitu carcinoma, intestinal ganglioneuroma, islet cell tumour, Kaposi'ssarcoma, kidney cancer, larynx cancer, leiomyomater tumour, livercancer, lung cancer, lymphomas, malignant carcinoid, malignanthypercalcemia, malignant melanomas, marfanoid habitus tumour, medullarycarcinoma, metastatic skin carcinoma, mucosal neuromas, mycosisfungoide, myelodysplastic syndrome, myeloma, neck cancer, neural tissuecancer, neuroblastoma, osteogenic sarcoma, osteosarcoma, ovarian tumour,pancreas cancer, parathyroid cancer, pheochromocytoma, polycythemiavera, primary brain tumour, prostate cancer, rectum cancer, renal celltumour, retinoblastoma, rhabdomyosarcoma, seminoma, skin cancer,small-cell lung tumour, soft tissue sarcoma, squamous cell carcinoma,stomach cancer, thyroid cancer, topical skin lesion, veticulum cellsarcoma, and Wilm's tumour. In a preferred embodiment, the cancer islung cancer.

The DMOI may be derived from a fusion gene or a fragment of a fusiongene. The fusion may be a fusion selected from the group consisting ofCD74-ROS1, SLC34A2-ROS1, SDC4-ROS1, EZR-ROS1, GOPC-ROS1, LRIG3-ROS1,TPM3-ROS1, PPFIBP1-ROS1, EML4-ALK, BCR-ABL, TCF3-PBX1, ETV6-RUNX1,MLL-AF4, SIL-TAL1, RET-NTRK1, PAX8-PPARG, MECT1-MAML2, TFE3-TFEB,BRD4-NUT, ETV6-NTRK3, TMPRSS2-ERG, TPM3-NTRK1, SQSTM1-NTRK1, CD74-NTRK1,MPRIP-NTRK1 and TRIM24-NTRK2. In some embodiments, the fusion is afusion between a gene selected from the group consisting of ROS1, ALK,EML4, BCR, ABL, TCF3, PBX1, ETV6, RUNX1, MLL, AF4, SIL, TALL RET, NTRK1,PAX8, PPARG, MECT1, MAML2, TFE3, TFEB, BRD4, NUT, ETV6, NTRK3, TMPRSS2,NKRT2 and ERG and at least one other gene. In preferred embodiments, thegene fusion is a ROS1 fusion, an ALK fusion, a NTRK1 fusion or a RETfusion. Fusion that are particular preferred are ROS1-CD74,ROS1-SLC34A2, ROS1-SDC4, ROS1-EZR, ALK-EML4, KIFSB-RET, TRIM33-RET,CCDC6-RET, NCO4A-RET, KIFSB-ALK, TPM3-NTRK1, SQSTM1-NTRK1, CD74-NTRK1,MPRIP-NTRK1.

Accordingly, the region that is targeted by the forward or reverseregion-specific primers may be selected from the group consisting ofROS1, ALK, EML4, BCR, ABL, TCF3, PBX1, ETV6, RUNX1, MLL, AF4, SIL, TALLRET, NTRK1, PAX8, PPARG, MECT1, MAML2, TFE3, TFEB, BRD4, NUT, ETV6,NTRK3, TMPRSS2, NKRT2 and ERG. In one embodiment, the region that istargeted by the forward or reverse region-specific primers may beselected from the group consisting of ROS1, ALK, NTRK1 and RET. In oneembodiment, the forward region-specific primer targets a first fusionpartner and the reverse region-specific primer targets as second fusionpartner, wherein the fusion partners are selected from the groupconsisting of CD74-ROS1, SLC34A2-ROS1, SDC4-ROS1, EZR-ROS1, GOPC-ROS1,LRIG3-ROS1, TPM3-ROS1, PPFIBP1-ROS1, EML4-ALK, BCR-ABL, TCF3-PBX1,ETV6-RUNX1, MLL-AF4, SIL-TAL1, RET-NTRK1, PAX8-PPARG, MECT1-MAML2,TFE3-TFEB, BRD4-NUT, ETV6-NTRK3, TMPRSS2-ERG, TPM3-NTRK1, SQSTM1-NTRK1,CD74-NTRK1, MPRIP-NTRK1 and TRIM24-NTRK2. In one embodiment, the fusionpartners are selected from the group consisting of ROS1-CD74,ROS1-SLC34A2, ROS1-SDC4, ROS1-EZR, ALK-EML4, KIF5B-RET, TRIM33-RET,CCDC6-RET, NCO4A-RET, KIF5B-ALK, TPM3-NTRK1, SQSTM1-NTRK1, CD74-NTRK1,MPRIP-NTRK1.

In one embodiment, the method comprises the use of a pool of primersthat targets at least two genes selected from the group consisting ofROS1, ALK, EML4, BCR, ABL, TCF3, PBX1, ETV6, RUNX1, MLL, AF4, SIL, TAL1,RET, NTRK1, PAX8, PPARG, MECT1, MAML2, TFE3, TFEB, BRD4, NUT, ETV6,NTRK3, TMPRSS2, NKRT2 and ERG. In one embodiment, the method comprisesthe use of a pool of primers that targets at least two genes selectedfrom the group consisting of ROS1, ALK, NTRK1 and RET. Of course, morethan two primer pools can be used. For example, in one embodiment, themethod comprises the use of a pool of primers that targets ROS1, a poolof primers that targets ALK, a pool of primers that targets NTRK1 and apool of primers that targets RET. Alternatively, at least 5 genes, atleast 10 genes or all of the genes ROS1, ALK, EML4, BCR, ABL, TCF3,PBX1, ETV6, RUNX1, MLL, AF4, SIL, TAL1, RET, NTRK1, PAX8, PPARG, MECT1,MAML2, TFE3, TFEB, BRD4, NUT, ETV6, NTRK3, TMPRSS2, NKRT2 and ERG may betargeted in a single reaction. When a fusion between two different genesis present, a first pool targeting a first gene in the fusion acts as apool of forward primers, and a second pool targeting the second gene inthe fusion acts as a pool of reverse primers. The forward and reversedesignations are arbitrary and can be swapped and are provided hereinfor the sake of clarity.

The fusions may be intronic fusions (a fusion between two introns),exonic fusion (a fusion between two exons) or it may be a intron/exonfusion (a fusion between an intron from one region or gene and an exonfrom another region or gene) or the fusion may be between two intergenicregions, or the fusion may be an intronic/intergenic orintergenic/exonic fusion. Most often the fusion will be an intronicfusion.

Fusion Calls

The present disclosure is particularly useful in determining thepresence of genomic fusions. When determining the presence or absence ofa gene fusion, a bioinformatic analysis may need to be undertaken todetermine whether or not the detection of an amplification product fromthe selective PCR is actually indicative of the presence of a genefusion. The decision on whether or not a gene fusion is present is knownas a “fusion call”.

As discussed for FIG. 8, amplicons are generated by two primersamplifying a fusion event which are then sequenced (dotted lineindicates read) by NGS (Black Arrows indicate sequencing primers). Theanalysis method involves determining the minimum number of base pairsthat need to be sequenced (for each primer site) to uniquely match atarget region. A strong anchor has sufficient base pairs sequenced touniquely match a target region, a weak anchor does not match only thetarget region but also matches other regions in the reference genome, ittherefore does not uniquely match the target region. The method uses theknown primer binding locations to determine the expected sequence withinthe reads which removes the need for aligning reads to the entirereference genome. In the example of FIG. 8a , the amplicon has twostrong anchors with both the ALK and EML4 portions (in this example) ofthe read uniquely matching a ALK and EML4 reference sequences. In theexample of FIG. 8b , the amplicon has one strong anchor and one weakanchor. The ALK portion of the read uniquely matches a target region,but the EML4 does not uniquely match the reference genome.

In some embodiments, the method comprises sequencing the reactionproduct from a first or subsequent PCR and matching the sequences to areference sequence or one or more databases of reference sequences (alsoreferred to herein as primer information databases). The referencesequences may be a reference genomic region or sequence, or one or moredatabases of reference genomic regions or sequences. The art may referto “mapping”, however the present methods do not entail “mapping” as itis understood in the art, since the origin of the read is inferred bythe presence of sequence that matches the sequences of the primers,which have known genomic locations; and the read is compared to theexpected sequences that lie downstream of the primer sequences. The termmapping on the other hand is used to describe the comparison of asequence with a long genomic sequence, such as a human genome, and theidentification of its likely origin from within this large sequencewithout prior knowledge. Therefore, in embodiments described herein, theanalysis of the sequence read comprises matching one or more portions ofthe sequence read to one or primer information databases comprising aplurality of reference sequences. Since the technique is distinct fromtraditional mapping techniques, the reference sequences contained in theone or more primer information databases against which the one or moreportions of the sequences reads are matched have a maximum length of upto 1 kb.

Methods of the disclosure may comprise comparing at least two portionsof the sequence read with one or more databases of reference sequences,wherein each portion comprises the sequence of a primer binding site andan adjacent downstream (i.e. 5′) sequence. The one or more databases ofreference sequences may comprise the genomic location corresponding toeach primer binding site in the database.

Matching the relevant portion(s) of the sequence read to a referencegenomic sequence or, preferably, to one or more databases of referencegenomic sequences, allows the skilled person to determine the precisegenomic rearrangement event that has occurred. An advantage of themethods and kits disclosed herein is that neither prior knowledge of thepresence of a genomic rearrangement, nor details of the preciserearrangement that has occurred, are needed for the method to be carriedout. In addition, unnecessary sequencing of reaction products notarising from a genomic rearrangement event is not required, drasticallyreducing the cost and effort required to determine the presence orabsence of the genomic rearrangement. Furthermore, computational poweris reduced, since the methods do not comprise mapping or aligning thesequence reads or portions thereof to a reference genome, which may bevery long. Instead, the sequence reads, or portions thereof, are matchedto one or more databases comprising reference genomic sequences, forexample wherein the reference genomic sequences have in the one or moredatabases have a maximum length of up to 1 kb.

Since the method is carried out on DMOIs that are derived from more thanone section of a genome (when a genomic rearrangement event hasoccurred), methods disclosed herein may comprise matching the DMOI totwo or more regions from the reference genome. For example, the methodmay comprise identifying two genes from which the sequence of the DMOIis derived. In the event of a genomic rearrangement, regions from eachof these two genes have been brought into sufficient proximity for aspecific PCR amplification product to be produced in the selective PCRcarried out using the pool of forward and reverse primers.

The methods disclosed herein may comprise identifying the presence of aforward primer binding site and a reverse primer binding site anduniquely matching both to their respective genomic location. Referencesto “uniquely matching” herein refer to being able to match a sequence toa single genomic location in a reference genome or reference genomicsequence (or database of genomic sequences). Uniquely matching cantherefore only occur when there is sufficient sequence information torule out any other locations. The length of sequence required variesaccording from location to location, depending on the heterogeneity of agiven region. Regions that include several repeats, for example, willtherefore require longer sequences to enable unique matching to takeplace. Other locations will only need short sequences (perhaps just thesequence of the primer itself) to uniquely match the primer to a givengenomic location.

A fusion call may be made if one, but preferably both, of the forwardand reverse primers and their downstream sequences can be uniquelymatched to a region in the genome. For example, a fusion call may bemade if the forward primer can be uniquely matched to a sequence in thefirst region of interest and/or if the reverse primer can be uniquelymatched to a sequence in the second region of interest.

Accordingly, in one embodiment there is provided a method fordetermining the presence or absence of a gene fusion in a DMOI(specifically, an amplicon, since the sequence that is provided is thesequence of a product of a selective PCR), the method comprising:

-   -   a. providing the sequence of a DMOI;    -   b. determining, from a population of known primers, the location        of at least one forward primer binding site and the location of        at least one reverse primer binding site in the DMOI;    -   c. matching the sequence of the DMOI to at least one region of        interest in a reference genome;    -   d. optionally determining the potential location of a gene        fusion between two different regions of interest of the genome;        and    -   e. determining whether a gene fusion is present in the DMOI.

The DMOI is the amplification product of a PCR using the population ofknown forward and reverse primers (for example, the primer pools or setsdisclosed herein).

In some embodiments, the method comprises matching the sequence of theDMOI to at least two different regions of interest in a referencegenome. The two regions may be suspected of having undergone a genomicrearrangement or gene fusion event.

In one embodiment, there is provided a method for determining thepresence or absence of a gene fusion in a DMOI, the method comprising:

-   -   a. providing the sequence of a DMOI as a sequence read;    -   b. identifying in the sequence read the presence of at least one        forward primer binding site and the presence of at least one        reverse primer binding site from a population of forward and        reverse primers;    -   c. determining the corresponding genomic locations of the        forward and reverse primer binding sites by reference to the        sequences of the forward and reverse primer binding sites and        the sequence downstream and adjacent to the forward and reverse        primer binding sites in the sequence read; and    -   d. determining the presence or absence of a gene fusion in the        DMOI.

The sequence read provided in step (a) is provided by sequencing one ormore DMOIs from a patient sample. For example, the sequence read mayprovide the sequence of a ctDNA molecule obtained from a patient. Thesequence of the sequence read is therefore derived from the genome of apatient, or more specifically, the genome of a tumour or other cancerpresent in the patient that gave rise to the ctDNA. Of course, thesequence may be provided according to any of the methods describedherein for determining the presence of absence of a gene fusion event.

Step (b) comprises identifying in the sequence read the presence of atleast one forward primer binding site and the presence of at least onereverse primer binding site from a population of corresponding forwardand reverse primers. The forward and reverse primers correspond with theforward and reverse primer binding sites in that one is the complementof the other, such that a primer would anneal to a corresponding primerbinding site. The sequences of the primers and/or the primer bindingsites are known. The sequence of the primers and/or primer binding sitesmay be contained in one or more databases. Since the sequence reads canbe provided by methods described herein, it will be apparent to thereader that the forward and reverse primers can be the pool ofregion-specific forward and reverse primers described above that areused to selectively amplify a gene fusion between two regions ofinterest.

When the presence of at least one forward and at least one reverseprimer binding site in the sequence read has been identified, step (c)determines the corresponding genomic locations of the forward andreverse primer binding sites by reference to the sequences of theforward and reverse primer binding sites and the sequence downstream andadjacent to the forward and reverse primer binding sites in the sequenceread. The corresponding genomic location is the original genomiclocation in the patient's genome or cancer that gave rise to the DMOI,which in turn gave rise to the sequence read provided in step (a). Thisstep is determining the corresponding genomic locations for the forwardand reverse primer binding sites in the genome that gave rise to thectDNA.

The downstream sequences are adjacent, meaning immediately adjacent tothe primer binding sites in the sequence read.

In step (d), when the genomic locations of the forward and reverseprimer binding sites are different, or are suspected of being different,a gene fusion event is present. When the genomic locations of theforward and reverse primer binding site are the same, a gene fusionevent is not present. By “different”, this refers to different first andsecond regions of interest that were targeted by the pools ofregion-specific forward and reverse primers. For example, when thegenomic locations of the forward and reverse primer binding sitesidentified in the sequence read are at least 1 kb apart in a genome thathas not undergone a gene fusion, or are usually found on differentchromosomes or in different genes, a gene fusion event is present in thesequence read (specifically, the DMOI that gives rise to the sequenceread).

It may only be possible to uniquely match one of the forward and reverseprimer binding sites in the sequence read to a corresponding location ina reference genome. However, even if it is not possible to uniquelymatch both forward and reverse primer binding sites in the sequence readto corresponding locations in a reference genome, a gene fusion eventcan still be predicted. For example, if a first primer binding site in asequence read is uniquely matched to a genomic location, the secondprimer binding site may be matched to a plurality of different genomiclocations. If all of those locations are different to the location thathas been uniquely identified as giving rise to the first primer bindingsite in the sequence read, then a gene fusion is still likely to bepresent. Therefore, the step of determining the corresponding genomiclocations of the forward and reverse primer binding sites by referenceto the sequences downstream and adjacent to the forward and reverseprimer binding sites in the sequence read may comprise matching at leastone of the forward and reverse primer binding sites in the sequence readto a unique genomic location. The other primer binding site in thesequence read may be matched to one or more genomic locations.

In one embodiment, the method comprises:

-   -   a. providing the sequence of a DMOI as a sequence read;    -   b. identifying in the sequence read the presence of at least one        forward primer binding site and the presence of at least one        reverse primer binding site from a population of corresponding        forward and reverse primers whose sequences are known;    -   c. when the presence of at least one forward and at least one        reverse primer binding site in the sequence read has been        identified, determining the corresponding genomic locations of        the forward and reverse primer binding sites by reference to the        sequences of the forward and reverse primer binding sites and        the sequences downstream of and adjacent to the forward and        reverse primer binding sites in the sequence read; and    -   d. determining the presence or absence of a gene fusion in the        DMOI, wherein when the forward and reverse primer binding sites        are in different genes, a gene fusion event is present.

The step of determining the corresponding genomic locations of theforward and reverse primer binding sites refers to uniquely identifyingthe genomic sequence in a reference genome that gave rise to thesequence of at least one of the forward or reverse primer binding sites.In one embodiment, the step of determining the corresponding genomiclocations of the forward and reverse primer binding sites refers touniquely identifying the genomic sequence in a reference genome thatgave rise to the sequence of both the forward and reverse primer bindingsites.

In some embodiments, the step of identifying the presence of the atleast one forward and at least one reverse primer binding site in thesequence read and the step of determining the corresponding genomiclocations of the forward and reverse primer binding sites comprisesinterrogating one or more databases. The one or more databases maycomprise:

-   -   a. the genomic location for each forward and reverse primer        binding site in the primer population;    -   b. the sequence of each forward and reverse primer binding site        in the primer population;    -   c. the downstream sequence in the corresponding genomic location        for each of the forward and reverse primer binding sites in the        primer population (optionally wherein the length of the        downstream sequence is at least 1 base pair); and    -   d. the minimum number of base pairs downstream of each primer        binding site required to uniquely match a primer binding site        from a sequence read to the corresponding genomic location.

Accordingly, for each forward and reverse primer in the primerpopulation, the database or databases comprise:

-   -   a. the genomic location for the primer binding site;    -   b. the sequence of the primer binding site (and/or the sequence        of the corresponding primer);    -   c. the sequence downstream of and adjacent to the primer binding        site in the corresponding genome (or at the corresponding        genomic location, optionally wherein the length of the        downstream sequence is at least 1 base pair) and    -   d. the minimum number of base pairs downstream of the primer        binding site required to uniquely match a primer binding site        from a sequence read to the corresponding location in the        genome.

The database may further comprise the location and/or identify of SNPsor other polymorphisms. For example, the downstream sequence for each ofthe primers may take into account the presence of polymorphisms(including SNPs, LTRs, STRs) to assist accurate identification. Theinclusion of polymorphisms assists the use of the database acrossbroader patient populations.

In some embodiments, the step of interrogating the one or more databasescomprises comparing the sequence downstream of the forward and reverseprimer binding sites in the sequence read with the correspondingdownstream sequences in the one or more databases. In other words, themethod compares the sequences downstream of the forward and reverseprimer binding sites in the sequence read with the downstream sequencesprovided in the database(s) for the corresponding forward and reverseprimers.

The database or databases may be referred to as primer databases orprimer information databases. The information may be spread acrossmultiple databases. For example, one database may contain the sequenceof each primer in the primer population and assign each primer in theprimer population a unique label. The label can then be used tointerrogate a separate database that provides the remaining information(such as the downstream sequences and length of downstream sequencerequired to uniquely map the primer to a specific region in acorresponding genome) arranged according to the unique labels of theprimers. The specific arrangement and storage of the information istherefore not crucial.

The method may comprise determining the “anchor strength” of the forwardand reverse primer binding sites in the sequence read, wherein:

-   -   a. a weak anchor is defined as a primer binding site in a        sequence read having a downstream sequence in the sequence read        that matches the downstream sequence in the corresponding primer        in the primer database, but said matching downstream sequence in        the sequence read is shorter than the length of the downstream        sequence required in the database of reference genomic sequences        to uniquely match the sequence obtained to the corresponding        genomic location; and    -   b. a strong anchor is defined as a primer binding site in a        sequence read having a downstream sequence in the sequence read        that matches the downstream sequence in the corresponding primer        in the primer database, and said matching downstream sequence in        the sequence read is equal to or longer than the length of the        downstream sequence required in the database of reference        genomic sequences to uniquely match the sequence obtained to the        corresponding genomic location.

In some embodiments, a gene fusion is called when both the forward andreverse primer binding sites are identified as strong anchors. In otherembodiments, a gene fusion is called when at least one of the forwardand reverse primer binding sites is identified as a strong anchor.

In one embodiment, the method comprises

-   -   a. providing the sequence of a DMOI as a sequence read;    -   b. interrogating one or more primer information databases to:        -   i. identify in the sequence read the presence of at least            one forward primer binding site and the presence of at least            one reverse primer binding site from a population of forward            and reverse primers; and        -   ii. determine the corresponding genomic locations of the            forward and reverse primer binding sites;    -   c. wherein the one or more primer information databases        comprise:        -   i. the genomic location for each forward and reverse primer            binding site in the primer population;        -   ii. the sequence of each forward and reverse primer binding            site in the primer population;        -   iii. the downstream sequence in the corresponding genomic            location for each of the forward and reverse primer binding            sites in the primer population; and        -   iv. the minimum number of base pairs downstream of each            primer binding site required to uniquely match a primer            binding site from a sequence read to a given genomic            location; and    -   c. determining the presence or absence of a gene fusion in the        DMOI.

The step of interrogating the database to identify in the sequence readthe presence of at least one forward primer binding site and thepresence of at least one reverse primer binding site may comprisecomparing the sequence of the sequence read (or portions thereof) withthe forward and reverse primer binding site sequences in the primerinformation database. The sequence of each forward and reverse primerbinding site in the primer population and their corresponding downstreamsequences may be referred to as the reference sequences (or referencegenomic sequences). It is against these reference sequences the sequenceread (or portions thereof) are matched. Specifically, the primer bindingsites in the sequence read may be compared to the sequences in the oneor more databases corresponding to the forward and reverse primerbinding sites in the primer population (part (ii) of the database above)and the adjacent downstream sequence in the sequence read may becompared to the corresponding downstream sequences in the matchinggenomic location (part (iii) of the database above). In someembodiments, the maximum length of the sequences in (ii) and (iii) aboveis 1 kb.

The step of determining the corresponding genomic locations of theforward and reverse primer binding sites may comprise comparing thesequences downstream of the forward and reverse primer binding sites inthe sequence read with the downstream sequences provided in the one ormore primer information databases. A unique genomic location may beassigned to the forward and/or reverse primer binding site from thesequence read when the downstream sequence in the sequence read is thesame as the or a downstream sequence for a corresponding primer in theprimer information database and the length of the downstream sequence inthe sequence read that is the same as the or a downstream sequence forthe corresponding primer in the primer information database is equal toor greater than the minimum number of base pairs downstream of theprimer binding site required to uniquely match the primer binding sitefrom a sequence read to the corresponding genomic location.

Of course, interrogation of the database may provide a plurality ofgenomic locations for each of the primer binding sites in the sequencereads, depending on the strength of the anchor. Accordingly, in oneembodiment, the method comprises identifying all the possible genomiclocations for both the forward and reverse primer binding sites in thesequence read.

A fusion may be called when at least one of the forward or reverseprimers is uniquely matched to a genomic location that is different toall of the possible genomic locations identified for the other primerbinding site in the sequence read.

In one embodiment, the method comprises

-   -   a. providing the sequence of a DMOI as a sequence read;    -   b. providing one or more primer information databases, wherein        the one or more primer information databases comprise:        -   i. the genomic location for each forward and reverse primer            binding site in a primer population;        -   ii. the sequence of each forward and reverse primer binding            site in the primer population;        -   iii. the downstream sequence in the corresponding genomic            location for each of the forward and reverse primer binding            sites in the primer population; and        -   iv. the minimum number of base pairs downstream of each            primer binding site required to uniquely match a primer            binding site from a sequence read to a given genomic            location; and    -   c. comparing the sequence of the sequence read with the forward        and reverse primer binding site sequences in the one or more        primer information databases to identify in the sequence read        the presence and identity of at least one forward primer binding        site and the presence and identity of at least one reverse        primer binding site from the population of forward and reverse        primers;    -   d. comparing the sequences downstream of the forward and reverse        primer binding sites in the sequence read with the corresponding        downstream sequences provided in the one or more primer        information databases;    -   e. assigning to the forward and/or reverse primer binding site        from the sequence read the corresponding genomic location of the        primer binding site in the one or more primer information        databases when:        -   i. the downstream sequence in the sequence read is the same            as the downstream sequence for the corresponding primer in            the primer information database; and        -   ii. the length of the downstream sequence in the sequence            read that is the same as the downstream sequence for the            corresponding primer in the primer information database is            equal to or greater than the minimum number of base pairs            downstream of the primer binding site required to uniquely            match the primer binding site from the sequence read to the            corresponding genomic location;    -   f. determining the presence or absence of a gene fusion in the        DMOI, wherein a gene fusion is present when the forward and        reverse primer binding sites in the sequence read are assigned        different genomic locations.

In one embodiment, the method comprises

-   -   a. providing the sequence of a DMOI as a sequence read;    -   b. providing one or more primer information databases, wherein        the one or more primer information databases comprise:        -   i. the genomic location for each forward and reverse primer            binding site in a primer population;        -   ii. the sequence of each forward and reverse primer binding            site in the primer population;        -   iii. the downstream sequence in the corresponding genomic            location for each of the forward and reverse primer binding            sites in the primer population; and        -   iv. the minimum number of base pairs downstream of each            primer binding site required to uniquely match a primer            binding site from a sequence read to a given genomic            location; and    -   c. comparing the sequence of the sequence read with the forward        and reverse primer binding site sequences in the one or more        primer information databases to identify in the sequence read        the presence and identity of at least one forward primer binding        site and the presence and identity of at least one reverse        primer binding site from the population of forward and reverse        primers;    -   d. comparing the sequences downstream of the forward and reverse        primer binding sites in the sequence read with the corresponding        downstream sequences provided in the one or more primer        information databases;    -   e. assigning to the forward and/or reverse primer binding site        from the sequence read the corresponding genomic location of the        primer binding site in the one or more primer information        databases when:        -   i. the downstream sequence in the sequence read is the same            as the downstream sequence for the corresponding primer in            the primer information database; and        -   ii. the length of the downstream sequence in the sequence            read that is the same as the downstream sequence for the            corresponding primer in the primer information database is            equal to or greater than the minimum number of base pairs            downstream of the primer binding site required to uniquely            match the primer binding site from the sequence read to the            corresponding genomic location;    -   f. determining the presence or absence of a gene fusion in the        DMOI, wherein a gene fusion is present when at least one of the        forward and reverse primer binding sites in the sequence read is        assigned to only one genomic location (i.e. is a strong anchor).        The other primer binding site in the sequence read may be        assignable to a plurality of genomic locations, for example if        the downstream sequence in the sequence read does not match the        expected downstream sequence in the one or more primer        information databases, or the downstream sequence in the        sequence read does match the excepted downstream in the one or        more primer information databases, but the matching downstream        sequence is not of sufficient length (i.e. it is a weak anchor).        However, one can still be confident of a fusion if all the        genomic locations for this weak anchor that are assignable to it        are all different from the genomic location assigned to the        strong anchor.

“Genomic locations” can also refer simply to genes. Hence, a gene fusionmay be present when at least one of the forward and reverse primerbinding sites in the sequence read is assigned to only one gene and theother primer binding site in the sequence read is assigned to aplurality of genes that are all different from the gene assigned to thefirst primer binding site in the sequence read.

The methods disclosed herein can be carried out without having to alignor map the sequence reads or portions thereof to a reference genomicsequence. In some embodiments, the methods disclosed herein can becarried out without having to align or map the sequence reads orportions thereof to a reference genomic sequence that is more than 10 kbin length, since only the sequences in the one or more primerinformation databases need to be interrogated.

The forward and reverse primer population includes the primers used inthe assays to detect the genomic rearrangement and to provide thesequence reads. The one or more primer information database thereforeincludes the corresponding information relating to the primer pools ofthe present invention. The database also takes into accountpolymorphisms in the different genomic locations covered by the primerpopulation. For example, a given primer in the primer pool might have aplurality of downstream sequences that differ according to thepolymorphisms located at that region of the genome. Accordingly, thedatabase or databases may comprise the sequence or at least onedownstream sequence that is adjacent to the primer binding site in thecorresponding genome for each primer in the one or more primerinformation databases.

The databases described herein contain sufficient information to enableto skilled person to compare the potential primer binding siteidentified in the sequence read with the primer binding sites of theprimers present in the pool of primers used to generate the sequenceread. If the sequence of the primer binding site itself uniquely matchesthe genome (for example because the primer binding site is of sufficientlength and/or it is present in a particularly heterogeneous section ofthe reference genome) then the sequence of the primer itself may besufficient to uniquely match the corresponding region in the DMOI to aunique position in the reference genome. However, if the primer bindingsite is not sufficiently long (or is in a less heterogeneous section ofthe reference genome), then additional downstream sequences (i.e. on the3′ side of the primer binding site) are required to determine a uniqueposition in the genome. Hence for some (and indeed most) primers it isnecessary to include the downstream sequences. The skilled person canmake a comparison between the sequence that is downstream of the primerbinding site in the sequence to see if it matches the expecteddownstream sequence for a given genomic location.

In some embodiments, the primer information database or databasecomprise the downstream sequences of at least 50% of the primers in theprimer pool. Preferably the reference database includes the downstreamsequence for all primers that do not themselves uniquely match thereference genome. The length of the downstream sequence can varyaccording to the primer and its binding site. In some embodiments, thelength of the downstream sequences is at least 1 nucleotide. Preferablythe length is at least 10 nucleotides. The downstream sequences aregenerally immediately adjacent to the primer binding sites (there are nonucleotides between the last nucleotide of the primer binding site andthe first nucleotide of the downstream sequence).

In some embodiments a fusion call may be made when both primer bindingsites are strong anchors. However, a fusion call still can be made withonly one strong anchor. Nevertheless, when the primer binding site isclose to the location of the gene fusion in the sequence read, there mayonly be a small number of nucleotides that are the same between thesequence read and the reference downstream genomic sequence. To make afusion call for only one strong anchor, it is preferred the matchbetween the downstream sequence in the DMOI and the downstream sequencein the reference primer database is at least 5 nucleotides, optionallyat least 10 nucleotides. Therefore, in one embodiment, the methodcomprises determining the distance of each of the primer binding sitesfrom the potential location of the gene fusion in the sequence read.

The downstream sequences provided in the one or more primer informationdatabases will be derived from a reference genome. The reference genomewill be one that is suitable for the analysis that is being undertaken.For example, for an analysis carried out on sequences derived from ahuman sample, a human genome will be used as a reference genome and thesource for the downstream sequences (a complete or partial humangenome). “Reference genome” herein includes fragments of a genome thatcorrespond to the regions of interest, for example specific genes thathave undergone a genomic rearrangement event (or are suspected of havingundergone such a rearrangement). The genomic locations that aredetermined are the genomic locations in the reference genome or partialreference genome.

The step of matching or comparing a sequence downstream of the forwardand reverse primer binding sites in the sequence read to at least onedownstream genomic location in a one or more databases may comprise: fora given primer binding site, interrogating the one or more primerdatabases for a corresponding downstream sequence, and comparing thedownstream sequence from the primer database to the sequence downstreamof the primer binding site in the sequence read.

The method may comprise a step of determining the potential location ofa gene fusion between two different primer binding sites in the sequenceread. In one embodiment, the method therefore comprises matching aportion of the sequence read (including the sequence of a forward primerbinding site) to a first genomic location and matching a differentportion of the sequence read (including the sequence of a reverse primerbinding site) to a second genomic location.

Of course, the fusion call methods disclosed herein can be combined withthe methods of determining the presence of a genomic rearrangement event(such as a gene fusion) disclosed herein. For example, in one embodimentthe method comprises:

-   -   a. contacting a sample comprising DNA molecules of interest        (DMOIs) with a pool of at least 20 region-specific forward        primers and a pool of at least 20 region-specific reverse        primers, wherein:        -   i. each of the forward primers in the forward primer pool            comprises a sequence specific for a first region of interest            and a first primer binding site; and        -   ii. each of the reverse primers in the reverse primer pool            comprises a sequence specific for a second, different,            region of interest and a second primer binding site;    -   b. amplifying the DMOIs using the region-specific primers;    -   c. conducting PCR using forward primers that target the first        primer binding site and reverse primers that target the second        primer binding site;    -   d. sequencing the PCR amplification product to provide a library        of sequence reads, wherein the sequence reads comprise the        sequence of a forward and/or reverse primer used in step (a);    -   e. identifying in at least one of the sequence reads the        presence of at least one region-specific forward primer binding        site and the presence of at least one region-specific reverse        primer binding site from the pool of forward and reverse        primers;    -   f. determine the corresponding genomic locations of the forward        and reverse primer binding sites by reference to the sequences        of the forward and reverse primer binding sites and the        sequences downstream and adjacent to the forward and reverse        primer binding sites in the sequence read;    -   g. determining the presence or absence of a gene fusion in the        DMOI.

Of course, the more detailed analysis methods can also be combined withthe different embodiments relating to the processing and sequencing ofthe DMOIs.

For example, one embodiment provides:

-   -   a. contacting a sample comprising DNA molecules of interest        (DMOIs) with a pool of at least 20 region-specific forward        primers and a pool of at least 20 region-specific reverse        primers, wherein:        -   i. each of the forward primers in the forward primer pool            comprises a sequence specific for a first region of interest            and a first primer binding site; and        -   ii. each of the reverse primers in the reverse primer pool            comprises a sequence specific for a second, different,            region of interest and a second primer binding site;    -   b. amplifying the DMOIs using the region-specific primers;    -   c. conducting PCR using forward primers that target the first        primer binding site and reverse primers that target the second        primer binding site;    -   d. sequencing the PCR amplification product to provide a library        of sequence reads, wherein the sequence reads comprise the        sequence of a forward and/or reverse primer used in step (a);    -   e. providing one or more primer information databases, wherein        the one or more primer information databases comprise:        -   i. the genomic location for each forward and reverse primer            binding site in a primer population;        -   ii. the sequence of each forward and reverse primer binding            site in the primer population;        -   iii. the downstream sequence in the corresponding genomic            location for each of the forward and reverse primer binding            sites in the primer population; and        -   iv. the minimum number of base pairs downstream of each            primer binding site required to uniquely match a primer            binding site from a sequence read to a given genomic            location; and    -   f. comparing the sequence of the sequence reads with the forward        and reverse primer binding site sequences in the one or more        primer information databases to identify in at least one of the        sequence reads the presence and identity of at least one forward        primer binding site and the presence and identity of at least        one reverse primer binding site from the population of forward        and reverse primers;    -   g. comparing the sequences downstream of the forward and reverse        primer binding sites in the sequence read with the corresponding        downstream sequences provided in the one or more primer        information databases;    -   h. assigning to the forward and/or reverse primer binding site        from the sequence read the corresponding genomic location of the        primer binding site in the one or more primer information        databases when:        -   i. the downstream sequence in the sequence read is the same            as the downstream sequence for the corresponding primer in            the primer information database; and        -   ii. the length of the downstream sequence in the sequence            read that is the same as the downstream sequence for the            corresponding primer in the primer information database is            equal to or greater than the minimum number of base pairs            downstream of the primer binding site required to uniquely            match the primer binding site from the sequence read to the            corresponding genomic location;    -   i. determining the presence or absence of a gene fusion in the        DMOI, wherein a gene fusion is present when the forward and        reverse primer binding sites in the sequence read are assigned        different genomic locations.

Additional steps may also be taken to help identify fusion calls. Forexample, an assessment may be made to determine if the detected orsuspected fusion is an in-frame fusion. If the detected or suspectedfusion is not an in-frame fusion, it may be discarded and not called asa true fusion. More specifically, it has been noted by the presentinventors that the vast majority of true gene fusion events, inparticular in cancer patients, result in in-frame products when the DNAis transcribed to RNA. Although most gene fusion events occur betweenintrons, the newly adjacent exons (i.e. those brought into closerproximity as a result of the gene fusion) are paired such that thecoding frame matches. For example, one end of an exon may end at the3^(rd) base of the codon reading frame, and the adjacent end of the nextexon still start at the 1^(st) base of the codon reading frame.Alternatively, one end of an exon may end at the 2^(nd) base of thecodon reading frame, and the adjacent end of the next exon still startat the 3^(rd) base of the codon reading frame, or one end of an exon mayend at the 1^(st) base of the codon reading frame, and the adjacent endof the next exon still start at the 2^(nd) base of the codon readingframe. By reviewing the sequencing information provided by the methodsdisclosed herein, and once the location of the genomic breakpoint hasbeen uniquely matched to a genome, the skilled person can correlate thegene fusion with the newly adjacent exons and determine if the resultingRNA transcript would produce an in-frame product. If it would not, thecall can be discarded as not being a true gene fusion event. It is notedthat the precise location of a breakpoint in an intron is not relevantto determining whether or not the breakpoint would produce an in-frameproduct. This is because, when the introns are removed, the adjacentexons are brought together, and the DNA is subsequently transcribed intoRNA. Therefore, it is simply the pairings of newly adjacent exons thatneeds to be analysed when determining if an in-frame product will beproduced. Accordingly, one embodiment comprises identifying the exonsthat are now adjacent as a result of the gene fusion and determining ifthe fusion would result in an in-frame product according to the lastnucleotide base of one exon and the first nucleotide base of the nextexon.

For example, in one embodiment, the location of the genomic breakpointis used to predict whether a resulting RNA fusion product would bespliced to produce an in-frame product. Only fusion breakpointspredicted to produce an in-frame product are called as a gene fusionevent. Conducting such an additional analysis step helps to furthercheck for artefacts, such as non-specific amplification and falsepositives and identify true variants.

In one embodiment, the method comprises determining whether the detectedgene fusion would result in an in-frame product when the DNA istranslated to RNA. A fusion call is made when the detected gene fusionwould result in an in-frame product when the fused DNA is translated toRNA.

The present disclosure also provides the primer information databasesdescribed herein. In one aspect, the invention provides one or moreprimer information databases, the one or more primer databasescomprising or collectively comprising, for a population of primers:

-   -   i. the genomic location for each forward and reverse primer        binding site in the primer population;    -   ii. the sequence of each forward and reverse primer binding site        in the primer population;    -   iii. the downstream sequence in the corresponding genomic        location for each of the forward and reverse primer binding        sites in the primer population; and    -   iv. the minimum number of base pairs downstream of each primer        binding site required to uniquely match a primer binding site        from a sequence read to a given genomic location.

The one or more databases disclosed herein may be provided on a computerreadable storage medium. The present disclosure therefore provides acomputer readable storage comprising the one or more primer informationdatabases disclosed herein.

Interrogation of the one or more databases may be carried out using acomputer. Similarly, the methods of analysing the sequence reads may beconducting using a computer.

Samples

The DMOIs may be contained in or derived from a sample from a patient.In some embodiments, the sample is a biological sample obtained from asubject, or a sample containing DMOIs that is extracted from abiological sample obtained from a subject. The patient sample can be atissue sample, for example a surgical sample. Preferably the sample is aliquid biopsy sample, such as blood, plasma, serum, urine, seminalfluid, stool, sputum, pleural fluid, ascetic fluid, synovial fluid,cerebrospinal fluid, lymph, nipple fluid, cyst fluid, or bronchiallavage. In some embodiments the sample is a cytological sample or smearor a fluid containing cellular material, such as cervical smear, nasalbrushing, esophageal sampling by a sponge (cytosponge),endoscopic/gastroscopic/colonoscopic biopsy or brushing, cervical mucusor brushing.

Many of the above samples can be obtained non-invasively, and cantherefore be taken regularly without great risk or discomfort to thesubject. Methods disclosed herein may comprise a step of obtaining asample from a patient. Alternatively, the methods may be carried out onsamples previously obtained from a patient (i.e., ex vivo/in vitromethods). In one embodiment, samples and/or DMOIs of interest areobtained by an in vivo/ex vivo nucleic acid harvesting technique—forexample dialysis or functionalised wire.

Samples may be obtained from patients suspected of having a particulardisease or condition, such as cancer. Such a disease or condition can bediagnosed, prognosed, monitored and therapy can be determined based onthe methods, systems and kits described herein. Samples may be obtainedfrom humans or from animals, such as a domesticated animal, for examplea cow, chicken, pig, horse, rabbit, dog, cat, or goat. Usually, a samplewill be derived from a human.

To obtain a blood sample, any technique known in the art may be used,e.g., a syringe or other vacuum suction device. A blood sample can beoptionally pre-treated or processed prior to tagging and analysis.Examples of pre-treatment steps include the addition of a reagent suchas a stabiliser, a preservative, a fixant, a lysing reagent, a diluent,an anti-apoptotic reagent, an anti-coagulation reagent, ananti-thrombotic reagent, magnetic property regulating reagent, abuffering reagent, an osmolality regulating reagent, a pH regulatingreagent, and/or a crosslinking reagent. In addition, plasma may beobtained from the blood sample, and the plasma be used in the subsequentanalysis.

When obtaining a sample from a human or an animal (e.g., blood sample),the amount can vary depending upon human or animal size and thecondition being screened. In some embodiments, up to 50, 40, 30, 20, 10,9, 8, 7, 6, 5, 4, 3, 2, or 1 mL of a sample is obtained. In someembodiments, 1-50, 2-40, 3-30, or 4-20 mL of sample is obtained. In someembodiments, more than 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60,65, 70, 75, 80, 85, 90, 95 or 100 mL of a sample is obtained.

A sample may be processed prior to undergoing further analysis. Suchprocessing steps may comprise purification (for example removal of cellsand/or debris from the sample), extraction or isolation of the DMOI. Inthe case of, for example, blood samples, the DMOI may be extracted fromthe blood sample for analysis. The amount of DNA present in theextracted sample may also be quantified prior to analysis.

In some embodiments, the sample may be obtained from the patient by anin vivo/ex vivo nucleic acid harvesting technique—for example dialysisor functionalised wire.

In particular embodiments, the method comprises a step of obtaining thesample from a patient. In other embodiments, the sample or DMOI issimply provided, as a sample was obtained at a prior point in time. Theskilled person is aware of suitable techniques for obtaining, storing,stabilising and/or transporting samples prior to analysis.

In some embodiments, the DMOI is contained in or derived from a patientsample, and the patient sample is processed prior to analysis todetermine the presence or absence of a genomic rearrangement. In someembodiments, the method comprises:

-   -   a. purification of the sample to obtain a purified sample        comprising the DMOIs (for example removal of cells and/or debris        from the sample); and/or    -   b. extraction or isolation of the DMOIs from the patient sample.

In one embodiment, the method comprises obtaining a blood sample from apatient, obtaining plasma from the blood sample, and optionallyextracting the DMOIs from the plasma sample. Methods disclosed hereinmay also comprise a step of purifying the amplicons from theamplification product after the or each PCR step.

Additional steps may be taken to minimise false positives and increasesensitivity of the methods. For example, in preferred embodiments,methods disclosed herein can be carried out more than once (e.g. atleast twice) to minimise false positives or negatives. In preferredembodiments, the methods disclosed herein are carried out on two or moresamples derived from a patient, or a patient sample is split into two ormore test samples (prior to or after processing of the patient sample)and the methods are carried out on the two or more test samples.Carrying out the methods in duplicate can help to eliminate falsepositives, avoiding unnecessary sequencing of nucleic acids, and alsoincreases the sensitivity of the assay. When the methods are repeated inthis way, the method may comprise comparing the analysis from the twosamples or two test samples. In some embodiments comprising thiscomparison step, the presence of a PCR amplification product from theselective PCR in both samples or both test samples is indicative of agenomic rearrangement event. As such, a fusion call is only made when agenomic rearrangement is detected in both samples or both test samples.The presence of strong or weak anchors may influence the decision on afusion call, as discussed above.

Other Methods of the Invention

The present disclosure provides a method, the method comprising:

-   -   a. providing a sample from a patient, said sample comprising one        or more DMOI (in particular cell-free DNA, such as ctDNA); and    -   b. determining the presence or absence of a genomic        rearrangement event according to a method disclosed herein.

The method may further comprise processing of the sample (for exampleextracting or isolating the DMOI from the patient sample) prior todetermining the presence or absence of a genomic rearrangement.

Other such methods disclosed herein include a method of diagnosingdisease (such as cancer), a method of determining disease prognosis(such as cancer prognosis), a method of determining disease remission orrelapse (such as cancer remission or relapse), a method of detectingprogression of disease (such as cancer), or a method of determining thepresence or absence of residual disease (such as residual cancer,wherein the DMOI is circulating tumour DNA (ctDNA)).

Regarding such methods, the methods may comprise determining thepresence or absence of a genomic rearrangement in a patient using amethod disclosed herein. For example, the method may comprise providinga sample from a patient, said sample comprising a plurality of cell-freeDNA (cfDNA) molecules (DMOIs), optionally processing the sample, anddetermining the presence or absence of the genomic rearrangement. Thenature and/or abundance of the genomic rearrangement being detected maybe indicative of the presence of disease, the prognosis of the disease,disease remission, disease relapse, disease progression, or the presenceof residual disease.

In preferred embodiments, the genomic rearrangement event is a genefusion event, such as a ROS1 fusion, ALK fusion, RET fusion or NTRK1fusion. The cancer may be any cancer, but of particular interest is lungcancer.

In some embodiments, the methods comprise determining the presenceand/or abundance of a genomic rearrangement in a sample from a patientwho has previously had a sample analysed according to a method disclosedherein.

The present disclosure also provides methods of treating disease, suchas cancer. The method may comprise the steps of:

-   -   a. providing a sample from a patient, said sample comprising one        or more cell-free DNA molecules of interest (DMOIs);    -   b. determining the presence or absence of a genomic        rearrangement event according to a method disclosed herein; and    -   c. administering a treatment, such as a therapy, to the patient,        or recommending a treatment to the patient.

The step of administering or recommending a treatment/therapy will bedependent on the analysis in step b). For example, it may be the casethat no disease is detected and hence no treatment is required.Alternatively, the method may detect cancer relapse, and hence treatmentwould be necessary. In some embodiments, the method may recommend thepatient for treatment based on the presence or absence of the genomicrearrangement event. In some embodiments, the method comprisescharacterising the patient's disease (such as cancer) and administeringor recommending the patient for an appropriate treatment.

When the disease is cancer, example treatments may include chemotherapy,radiotherapy, immunotherapy, targeted therapy and/or surgery.

Typical chemotherapeutic agents include alkylating agents (for examplenitrogen mustards (such as mechlorethamine, cyclophosphamide, melphalan,chlorambucil, ifosfamide and busulfan), nitrosoureas (such asN-Nitroso-N-methylurea (MNU), carmustine (BCNU), lomustine (CCNU) andsemustine (MeCCNU), fotemustine and streptozotocin), tetrazines (such asdacarbazine, mitozolomide and temozolomide), aziridines (such asthiotepa, mytomycin and diaziquone), cisplatins and derivatives thereof(such as carboplatin and oxaliplatin), and non-classical alkylatingagents (such as procarbazine and hexamethylmelamine)), antimetabolites(for example anti-folates (such as methotrexate and pemetrexed),fluoropyrimidines (such as fluorouracil and capecitabine),deoxynucleoside analogues (such as cytarabine, gemcitabine, decitabine,Vidaza, fludarabine, nelarabine, cladribine, clofarabine andpentostatin) and thiopurines (such as thioguanine and mercaptopurine)),anti-microtubule agents (for example Vinca alkaloids (such asvincristine, vinblastine, vinorelbine, vindesine, and vinflunine) andtaxanes (such as paclitaxel and docetaxel)), platins (such as cisplatinand carboplatin), topoisomerase inhibitors (for example irinotecan,topotecan, camptothecin, etoposide, doxorubicin, mitoxantrone,teniposide, novobiocin, merbarone, and aclarubicin), and cytotoxicantibiotics (for example anthracyclines (such as doxorubicin,daunorubicin apirubicin, idarubicin, pirarubicin, aclarubicin,mitoxantrone), bleomycins, mitomycin C, mitoxantrone, and actinomycin),and combinations thereof.

For lung cancer patients, in particular non-small-cell lung carcinoma(NSLC) patients, the treatment may include EGFR Inhibitors (such aserlotinib (Tarceva), afatinib (Gilotrif), gefitinib (Iressa) orosimertinib (Tagrisso)), Alk inhibitors (such as crizotinib (Xalkori),ceritinib (Zykadia) or alectinib (Alecensa), Met Inhibitors (such astivantinib (ARQ197), cabozantinib (XL184) or crizotinib), or ROS1inhibitors (such as Foretinib or crizotinib).

The treatment may comprise surgery, for example resection of a tumour.In particular, resection may be recommended if metastasis or diseaseprogression has been predicted or is suspected.

The present disclosure also provides a method of determining a treatmentregimen for a patient or a patient suspected of having disease (such ascancer), comprising:

-   -   a. providing a sample from a patient, said sample comprising one        or more cell-free DNA molecules of interest (DMOIs);    -   b. determining the presence or absence of a genomic        rearrangement event according to a method disclosed herein; and    -   c. selecting a treatment regimen for the patient according to        the presence or absence of a genomic rearrangement in the one or        more DMOIs.

In one embodiment there is provided a method of predicting a patient'sresponsiveness to a treatment, such as a cancer treatment, comprising:

-   -   a. providing a sample from a patient, said sample comprising one        or more cell-free DNA molecules of interest (DMOIs);    -   b. determining the presence or absence of a genomic        rearrangement event according to a method disclosed herein;    -   c. predicting a patient's responsiveness to a cancer treatment        according to the presence or absence of a genomic rearrangement        in the one or more DMOIs.

Methods of determining the present or absence of a genomic rearrangementevent include methods of characterising a genomic rearrangement event ormethods of characterising a patients' disease.

The methods of the present disclosure also allow detection of minimalresidual disease in patients. For example, following treatment forcancer, the methods disclosed herein may be used to detect residualdisease using a sample obtained from the patient. The potential forrelapse can therefore be detected early and appropriate additionaltreatment steps be taken.

Methods of generating reports are also provided herein. For example, inone embodiment there is provided a method of generating a report,comprising:

-   -   a. providing a sample from a patient, said sample comprising one        or more cell-free DNA molecules of interest (DMOIs);    -   b. determining the presence or absence of a genomic        rearrangement event according to a method as described herein;    -   c. generating a report comprising a listing of genomic        rearrangement events determined to be present in step (b).

The report may additionally or alternatively provide the genomiccoordinates of a genomic rearrangement determining in step (b). Thereport may further provide or suggest suitable treatments for thepatient according to the genomic rearrangements determined in step (b).

A report may be generated in any of the diagnostic or prognostic methodsdescribed herein. For example, the report may include a prediction of apatient's responsiveness to a treatment, a suitable treatment regimenfor a patient, a diagnosis (for example a cancer diagnosis), a diseaseprognosis (such as a cancer prognosis), a determining of diseaseremission or relapse (for example cancer remission or relapse), aresponsiveness of a patient to a therapy (for example to cancertherapy), a detection of disease progression (such as cancerprogression), a determination of the present or absence of residualdisease (such as residual cancer), etc.

Kits and Primer Pools

The present disclosure also provides kits comprising differentcomponents used in the methods disclosed herein. A kit of partsdisclosed herein may comprise a plurality of forward primers and aplurality of reverse primers suitable for detecting a genomicrearrangement event (such as a gene fusion). The forward primers areeach specific for a first region of interest, and the reverse primersare each specific for a second, different, region of interest. In oneembodiment, the first and second regions of interest are in differentgenes. The different regions of interest or different genes are locatedon different chromosomes when a genomic rearrangement event has notoccurred. A plurality of primers is referred to herein as a set or poolof primers. References to “multiple” primers herein refer to acollection of at least two primers, but preferably at least 20 forwardand at least 20 reverse primers are used. Primers targeted to regionssuspected of being involved in a genomic rearrangement event arereferred to as the selective PCR primers or region-specific primers.

In some embodiments, the first and second regions of interest arelocated on the same chromosome but are located such that little PCRamplification product (or only non-specific PCR amplification product)is generated in the absence of a genomic rearrangement event when theforward and reverse primers are used in a PCR. In some embodiments, thefirst and second regions of interest are located on the same chromosomebut are separated by at least 160 base pairs. When more than two regionsof interest are targeted, each region is separated from all the othertargeted regions in the primer kit.

In certain embodiments, the forward primers tile the first region ofinterest and/or the reverse primers tile the second region of interest.The forward and/or reverse primers may tile the first and/or secondregion of interest at intervals of from about 10 to about 2000 basepairs, from about 10 to about 1000 base pairs, from about 10 to about500 base pairs, from about 10 to about 250 base pairs, from about 10 toabout 150 base pairs, from about 25 to about 125 base pairs, from about50 to about 100 base pairs, or from about 60 to about 90 base pairs. Inone embodiment, the forward and reverse primers tile the first andsecond region of interest, respectively, at intervals from about 60 toabout 90 base pairs.

In some embodiments, the forward and reverse primers tile the first andsecond region of interest, respectively, at intervals of up to about 50,about 100, about 150, about 250, about 500, about 1000, about 2000 orabout 2500 base pairs. In one embodiment, the forward and reverseprimers tile the first and second region of interest, respectively, atintervals of up to about 100 base pairs.

When more than two regions of interest are targeted, each region can betiled using pools or sets of primers in the same way.

The selective PCR primers can be specific to different sequences in thecorresponding regions of interest, wherein the different sequences in agiven region of interest do not overlap with each other.

The disclosure also provides pools or sets of selective PCR primers(optionally as part of a kit), wherein the pool or set comprises atleast 20, at least 50 or at least 100 different forward primers and/orat least 20, at least 50 or at least 100 different reverse primers. In apreferred embodiment, the pool or kit comprises at least 20 differentselective PCR forward primers and at least 20 different selective PCRreverse primers. More primers can be included for targeting largerand/or multiple regions of interest in a single reaction, for example atleast 200 different forward and reverse selective PCR primers may bepresent.

In one embodiment, the pool or kit of primers comprises:

-   -   a. a set of at least 20 forward primers, wherein the forward        primers are specific for a first region of interest, and wherein        each member of the set of primers targets a different DNA        sequence in the first region of interest, and optionally wherein        there are multiple copies of each member for the set of forward        primers; and    -   b. a set of at least 20 reverse primers, wherein the reverse        primers are specific for a second region of interest, and        wherein each member of the set of primers targets a different        DNA sequence in the second region of interest, and optionally        wherein there are multiple copies of each member for the set of        reverse primers;        wherein the first and second regions of interest are different.        Preferably, the forward primers further comprising a first        universal primer binding site and the reverse primers further        comprise a second universal primer binding site. The first and        second primer binding sites are different from each other.

Multiple copies of each type of primer may be present in the pool orkit.

The kit may comprise multiple sets of selective PCR primers. Forexample, in one embodiment, the kit comprises at least 3 sets ofprimers, at least one of which is a set of forward primers and at leastone of which is a set of reverse primers. Each set of primers comprisesa plurality of primers that tile a region of interest, as discussedelsewhere. Each region of interest may be different. In someembodiments, the kits comprise at least 4, at least 5, at least 6, atleast 7, at least 9, at least 9 or at least 10 pools of primers, eachspecific for a different region of interest. In one embodiment, the kitcomprises at least 5 pools of forward selective PCR primers and at least5 pools of reverse selective PCR primers.

The sets of selective PCR primers generally will target regions ofinterest that are suspected of having undergone a genomic rearrangement.In some cases, a set of forward selective PCR primers in one pair offorward and reverse primer sets may act as a set of reverse selectivePCR primers in another pair of forward and reverse primer sets.

In some embodiments, the selective PCR primers target a gene fusion. Atleast one pair of forward and reverse primers may anneal to a DMOIwithin 500 base pairs from each other, or within 400 base pairs fromeach other, or within 300 base pairs from each other, or within 200 orwithin 175 base pairs from each other when a genomic rearrangement, suchas a gene fusion, is present.

In one embodiment, there is provided a kit of primers comprising atleast one set of primers that target a region of interest in the ROS1gene, at least one set of primers that target a region of interest thatis a potential fusion partner for the ROS1 gene, at least one set ofprimers that target the ALK gene, and at least one set of primers thattarget a region of interest that is a potential fusion partner for theALK gene.

In one embodiment, there is provided a kit of primers comprising atleast four sets of primers, wherein the four sets of primers target ALK,ROS1, RET and NTRK1. In another embodiment, there is provided a kit ofprimers comprising at least 16 sets of primers that target ALK, EML4,ROS1, CD74, SLC34A2, SDC4, EZR, RET, KIF5b, CCDC6, NCOA4, TRIM33, NTRK1,MPRIP, SQSTM1 and TPM3. Each set of primers targets a different gene.

In some embodiments, the selective PCR primers in the kit or primerpools or sets are gene specific primers. The different sets of selectivePCR primers may be specific for different genes.

Each selective PCR primer of the kit or pool comprises a region-specificsequence and may optionally comprise an adaptor sequence and/or a UPS.The adaptor sequence is an adaptor sequence for sequencing the ampliconsfrom the PCR.

Preferably, each selective PCR primer of the kit or pool comprises aregion-specific sequence and a universal primer binding site. In someembodiments, the kit or pool further comprises additional forward andreverse primer pairs specific to the universal primer sites on theselective PCR primers specific to the regions of interest. In suchembodiments, when provided as a kit, the additional forward and reverseprimer pairs may be disposed separately from the selective PCR primers.Additionally, the additional primers in the second PCR comprise aUPS-specific sequence and an adaptor sequence for sequencing theamplicons from the selective PCR.

As noted, the methods disclosed herein can be used to target multipleregions of interest and hence multiple possible genomic rearrangements.Not only can the methods be used to detect any kind of possible fusionbetween two regions of interest (i.e. at any point along their sequence,even without prior knowledge of the location of the rearrangement), butfusions between different regions (e.g. genes) can be detected in asingle reaction by including multiple sets of primers. Each primer setwill tile a given region of interest, with each primer set targeting adifferent region. Multiple sets of forward and reverse primers can beincluded. Whilst each set will generally target a different region, whatis considered a forward primer set for one possible genomicrearrangement event could be a reverse primer set of a different genomicrearrangement event.

In some embodiments, the kits include instructions for use, inparticular instructions relating to the methods disclosed herein.

Of course, the kits and pools disclosed herein can be used in themethods disclosed herein. Furthermore, the primer information databasesmay comprise the relevant information for all of the primers in theprimer pools disclosed herein.

In a very specific embodiment there is provided a method comprising thefollowing steps:

-   -   1. 10 ml of blood is collected into Cell-Free DNA BCT Streck        blood tubes.    -   2. Blood is processed to plasma using methods known in the art    -   3. DNA is extracted from plasma using QIAamp Circulating Nucleic        Acid Kit (Qiagen) following manufactures protocols    -   4. DNA is quantified by digital droplet PCR to determine the        copies of cfDNA within the sample    -   5. Selective PCR with primer pools targeting rearrangement (i.e.        EML4-ALK) is performed on DNA sample. Primer pool also contains        18 primer pairs targeting regions with known population SNPs.        -   Selective PCR is performed on two replicates of the same            sample.    -   6. Amplicons generated from step 5 are purified using SPRISelect        (Beckman Coulter) following manufactures protocols.    -   7. A 2^(nd) PCR is performed using primers targeting the UPS        attached in Step 5. These primers also contain sample        barcodes/indexes which are used for sample identification.        -   This step further amplifies the products of Step 5    -   8. Amplicons generated from Step 7 are purified using SPRISelect    -   9. Samples are pooled together into a single reaction tube to        generate a pool library    -   10. A region between 200-350 bp is size selected using the        Pippin Prep System (Sage Science)    -   11. Library from Step 10 is quantified using Library        Quantification Kit (KAPA Biosystems)    -   12. Quantified pooled library is sequenced on NextSeq Platform        using 300 Cycles of sequencing Analysis    -   13. Samples are de-multiplexed—Indexes are used to identify        unique samples    -   14. Analysis pipeline performed (details to follow)    -   15. Fusion breakpoint is called depending on weak/strong anchor        methodology    -   16. Patient specific report is generated indicating        presence/absence of Fusion and indicating treatments/trials        relevant to the genetic alteration

The present disclosure provides a method for detecting genomicrearrangements, including gene fusions, comprising:

-   -   a. providing a patient blood sample comprising ctDNA;    -   b. extracting the ctDNA from the sample;    -   c. conducting a selective multiplex PCR on the extracted ctDNA        using a pool of at least 20 forward primers specific for a first        gene and a pool of at least 20 reverse primers specific for a        second gene, wherein the pool of forward primers tiles a first        gene (or a region thereof) and the pool of reverse primers tile        a second gene (or a region thereof), with spaces of between 50        and 100 nucleotide bases between adjacent primers in the pools,        wherein the PCR incorporates universal primer binding sites into        the amplicons that are generated;    -   d. conducting a further PCR using primers specific for the        universal primer binding sites incorporated in step c.;    -   e. sequencing the amplification product from step d.; and    -   f. determining the presence or absence of a genomic        rearrangement according to the sequence of the amplification        product.

The preferred features for the second and subsequent aspects are asprovided for the first aspect, mutatis mutandis.

The present invention will now be further explained by reference to anumber of non-limiting examples.

EXAMPLES

Aspects of the present teachings can be further understood in light ofthe following examples, which should not be construed as limiting thescope of the present teachings in any way.

Example 1: Detection of EML4-ALK Variant at a Range of Allelic Fractions

A custom cell free DNA reference standard containing an EML4-ALK fusionof sequence GAAGTTCCTATACTTTCTAGAGAATAGGAACTTC (SEQ ID NO: 1) at anallelic fraction of 2.5% was obtained from Horizon Discoveries. Thisreference standard was diluted in sheared (average 188 bp) humanplacental DNA (Bioline) to achieve allelic fractions of 1%, 0.5%, 0.25%,0.125% and 0.0625%. Three samples were created at each allelic fraction.

Each sample was split into two replicates, each containing a total of4000 input copies. PCR amplification was performed on two replicatesusing the ALK primer panel (table 1). Each PCR contained 25 uL DNA, 27.5uL Platinum SuperFi 2× Master Mix (Invitrogen) and 2.5 uL of the ALKprimer pool (for primer concentration see table 1). PCR cycling wasfollowed using manufacturer' instructions. The PCR product was cleanedup using SPRIselect reagent (Beckman Coulter B23319) using themanufacturers protocol. DNA was eluted in 18 uL and a second PCR usingIndexed illumina primers was performed. Each PCR contained 15 uL DNA,17.5 uL Platinum SuperFi 2× Master Mix (Invitrogen) and 2.4 uL Indexedillumina primers. PCR Cycling was followed using manufacturesinstructions. The PCR product was cleaned up once using SPRIselectreagent (Beckman Coulter B23319) using the manufacturers protocol.indexes samples from different replicates were pooled into a tubecontaining 10 uL 10 mM Tris-HCl pH 8. Samples were selected for 195-350bp using a 2% Agarose Dye Free cassette and marker L on the Pippin Prep(Sage Science), following the manufacturer's instructions. Size selectedDNA was quantified by qPCR using a KAPA Library quantification kit(KAPABIOSYSTEMS), following the manufacturer's instructions. Quantifiedlibraries were sequenced on the NextSeq500 Illumina platform and dataanalysis was performed.

EML4-ALK enrichment using Selective PCR and Next generation sequencing(FIG. 1). The EML4-ALK fusion variant was detected at all allelicfractions tested (FIG. 2); illustrating that selective PCR consistentlyamplifies as little as 2.5 molecules of Fusion DNA as indicated by 100%detection at 0.0625% AF (4000 input copies). The sequence obtained bythe selective PCR method matched the expected breakpoint (FIG. 2A),indicating the selective nature of the method. Specificity of the methodis at 100% with no additional Fusions detected in any of the samplestested and with no fusion calls being made in samples that don't containFusion DNA (0% AF). The median read depth of the ALK-EML4 fusion (FIG.3), at a range of AFs, shows a decrease in reads obtained by selectivePCR that correlates with a decrease in AF, indicating linearamplification of the gene fusion.

Example 2: Detection of ROS1-CD74 Variant

A synthetic gBlock containing a ROS1 fusion sequence (based on asequence reported in the literature: Seki, Mizukami and Kohno,Biomolecules, 2015, 5, 2464-2476) was synthesized by IDT and was shearedusing the covaris to achieve an average size of 150 bp. The gBlock wasadded to sheared (average 188 bp) human placental DNA (Bioline) toachieve an allelic fraction of 1%.

Each sample was split into two replicates, each containing a total of4000 input copies. PCR amplification was performed on two of thereplicates using the ROS1 primer panel (table 2). Each PCR contained 25uL DNA, 27.5 uL Platinum SuperFi 2× Master Mix (Invitrogen) and 2.5 uLof the ROS1 primer pool (for primer concentration see table 2). PCRCycling was followed using manufactures instructions. The PCR productwas cleaned up using SPRIselect reagent (Beckman Coulter B23319) usingthe manufacturers protocol. DNA was eluted in 18 uL and a second PCRusing Indexed illumina primers was performed. Each PCR contained 15 uLDNA, 17.5 uL Platinum SuperFi 2× Master Mix (Invitrogen) and 2.4 uLIndexed illumina primers. PCR Cycling was followed using manufacturesinstructions. The PCR product was cleaned up once using SPRIselectreagent (Beckman Coulter B23319) using the manufacturers protocol.indexes samples from different replicates were pooled into a tubecontaining 10 uL 10 mM Tris-HCl pH 8. Samples were selected for 195-350bp using a 2% Agarose Dye Free cassette and marker L on the Pippin Prep(Sage Science), following the manufacturer's instructions. Size selectedDNA was quantified by qPCR using a KAPA Library quantification kit(KAPABIOSYSTEMS), following the manufacturer's instructions. Quantifiedlibraries were sequenced on the NextSeq500 Illumina platform and dataanalysis was performed.

ROS1-CD74 enrichment using selective PCR and Sequencing on the NextSeqplatform (FIG. 4). The sequence of the ROS1-CD74 fusion breakpoint isknown in the field (Seki, Mizukami and Kohno, Biomolecules, 2015, 5,2464-2476) and was synthesised into a double stranded DNA fragment (FIG.5). The method detected the fusion breakpoint with two primer pairs,CD74_E6-I6_10/ROS1_I32_E33_414 and CD74_E6_I6_9/ROS1_I32_E33_414 (FIG.6A). The sequence read of both primer pairs matched that of thesynthesised published ROS1-CD74 breakpoint (FIG. 6B). The sequence readobtained for each primer pair is shown and is a 100% match with thepublished breakpoint; highlighting that the selective PCR method canamplify a fusion breakpoint with multiple primer combinations and canaccurately identify the sequence of a fusion breakpoint.

Example 3: Detection of ROS1-CD74 Variant Using Sequential Amplification

The same synthetic ROS1 fusion gBlock at 1% allelic fraction as was usedin Example 2 was tested. Each sample was split into two replicates, eachcontaining a total of 4000 input copies. Linear amplification of thetemplate was performed on two of the replicates using only the ROS1forward primer panel. Each reaction contained 25 uL DNA, 27.5 uLPlatinum SuperFi 2× Master Mix (Invitrogen) and 2.5 uL of the ROS1forward primer pool. Cycling was followed using manufacturesinstructions. The PCR product was cleaned up once using SPRIselectreagent (Beckman Coulter B23319) using the manufacturers protocol. DNAwas eluted in 18 uL and a first PCR using a i5 adapter forward primerand the ROS1 reverse primer pool was performed. Each PCR contained 10 uLDNA, 25 uL Platinum SuperFi 2× Master Mix (Invitrogen), 2.5 ul of the i5adapter forward primer and 2.5 uL of the ROS1 reverse primer pool.Cycling was followed using manufactures instructions. The PCR productwas cleaned up once using SPRIselect reagent (Beckman Coulter B23319)using the manufacturers protocol. DNA was eluted in 18 uL and a secondPCR using Indexed illumina primers was performed. Each PCR contained 15uL DNA, 17.5 uL Platinum SuperFi 2× Master Mix (Invitrogen) and 2.4 uLIndexed illumina primers. PCR Cycling was followed using manufacturesinstructions. The PCR product was cleaned up once using SPRIselectreagent (Beckman Coulter B23319) using the manufacturers protocol.Indexed samples from different replicates were pooled into a tubecontaining 10 uL 10 mM Tris-HCl pH 8. Samples were selected for 195-350bp using a 2% Agarose Dye Free cassette and marker L on the Pippin Prep(Sage Science), following the manufacturer's instructions. Size selectedDNA was quantified by qPCR using a KAPA Library quantification kit(KAPABIOSYSTEMS), following the manufacturer's instructions. Quantifiedlibraries were sequenced on the NextSeq500 Illumina platform and dataanalysis was performed.

Example 4: Matching Sequences to a Reference Database

Once sequence reads have been demultiplexed, adaptors have been trimmedand reads have been merged, they are compared against a database of allprimers used in the fusion assay. Any sequencing reads containing thesequence of a primer designed to a 5′ partner at the start and that of a3′ partner at the end are identified as potential fusion reads andcarried forward for further analysis. The table of primers also containsa list of the expected sequences downstream from each primer (based onthe primer bind site in the targeted region as opposed to another partof the genome) and the number of bases that need to match following theend of the primer in order to attain either low or high confidence thatthe sequence being read belongs to the potential fusion partner. Afusion may be called when the sequence from at least one side isidentified with high confidence as belonging to a possible fusionpartner (e.g. ELM4) and the other side is identified as belonging withat least low confidence to a fusion partner (e.g. ALK). A fusion mightbe only called if this is either detected in duplicate reactions or ifthere are 2 or more reads where both sides are high confidence.

It will also be recognized by those skilled in the art that, while theinvention has been described above in terms of preferred embodiments, itis not limited thereto. Various features and aspects of the abovedescribed invention may be used individually or jointly. Further,although the invention has been described in the context of itsimplementation in a particular environment, and for particularapplications (e.g. cfDNA analysis) those skilled in the art willrecognize that its usefulness is not limited thereto and that thepresent invention can be beneficially utilized in any number ofenvironments and implementations where it is desirable to examine othersamples. Accordingly, the claims set forth below should be construed inview of the full breadth and spirit of the invention as disclosedherein.

1. A method of detecting a genomic fusion event, comprising: (a)splitting a test sample comprising cell-free DNA (cfDNA) from thebloodstream of a human subject into a first replicate and a secondreplicate; (b) separately combining the first and second replicates withthe same primers and polymerase to produce a first reaction mix and asecond reaction mix, wherein the primers in the first reaction mix and asecond reaction mix both comprise a first set of primers and a secondset of primers, wherein: i. the primers of the first set specificallyhybridize to the same strand of a first region in a reference humangenome; ii. the primers of the second set specifically hybridize to thesame strand of a second region in the reference human genome; whereinthe first and second regions are on different chromosomes or are on thesame chromosome but spaced apart by at least 10 kb; (c) thermocyclingthe first and second reaction mixes to produce first and second PCRproducts; (d) independently identifying, by sequencing the product ofstep (c) or an amplification product thereof, whether the same fusionmolecule exists in the first and second PCR products; wherein a fusionmolecule that exists in both the first and second PCR products indicatesthat the human subject has a tumor which comprises a genomicrearrangement that fuses the first region with the second region.
 2. Themethod of claim 1, wherein step (d) comprises identifying the identityof the first and second regions that are fused together as well as thefusion junction in any fusion molecules in the first and second PCRproducts.
 3. The method of claim 1, wherein the primers of the first sethave a first 5′ tail and the primers of second first set have a second5′ tail, and the method comprises amplifying the product of step (b)using universal primers that hybridize to the first or second 5′ tailsor their complement prior to sequencing.
 4. The method of claim 1,wherein: i. the first set of primers comprises at least 20 primers; ii.the second set of primers comprises at least 20 primers.
 5. The methodof claim 1, wherein the first region is a kinase gene and the second isfusion partner for the kinase gene.
 6. The method of claim 5, whereinthe kinase gene is the ALK gene and the potential fusion partner for thekinase gene is the EML4 gene.
 7. The method of claim 5, wherein thekinase gene is the RET gene and the potential fusion partner for thekinase gene is the TRIM33, CCDC6, KIF5B and NCOA4 genes.
 8. The methodof claim 5, wherein the kinase gene is the ROS1 gene and the potentialfusion partner for the kinase gene is the ROS1, CD74, SLC34A2 and SDC4genes.
 9. The method of claim 5, wherein the kinase gene is the NTRK1gene and the potential fusion partner for the kinase gene is the SQSTM1gene.
 10. The method of claim 1, further comprising identifying patientas being a candidate for therapy if the same fusion molecule is producedin both of the first and second PCR products of step (c).
 11. The methodof claim 1, wherein a fusion molecule that exists in neither or only oneof the first and second PCR products indicates that the human subjectdoes not have a tumor comprising the genomic rearrangement.